Importing OpenStreetMap data into Postgresql

Fri 14 September 2018 by Thejaswi Puthraya

I had the requirement of extracting hospitals from OpenStreetMap into Postgresql so that some data analysts could run some scripts off it.

I fired up an AWS EC2 server (t3.large) running Ubuntu and installed the awscli package and configured it using my AWS Key and Secret Key.

sudo apt-get install awscli
aws configure

Then I downloaded the PBF file from AWS S3 (you don't incur a charge if downloaded from S3 if you are using EC2)

aws s3 cp s3://osm-pds/2018/planet-180903.osm.pbf .

The file size was about 42GB and took 20 minutes to download. Next, I used Imposm3 instead of Osm2pgsql to read the PBF file and import into postgresql server.

imposm import -config config.json -appendcache -read planet-180903.osm.pbf -write

The config.json specified the database and mapping parameters.

{
     "cachedir": "/home/ubuntu/cache",
     "connection": "postgis://osm:osm@localhost:5432/osm",
     "mapping": "mapping.json",
     "srid": 4326
}

Since I was only interested in the location of hospitals and didn't require other tags or admin boundaries, relations etc, this is how my mapping.json looked.

{
     "tags": {"load_all": true},
     "tables": {
         "amenities": {
             "columns": [
                 {
                     "type": "id",
                     "name": "osm_id",
                     "key": null
                 },
                 {
                     "type": "geometry",
                     "name": "geometry",
                     "key": null
                 },
                 {
                     "type": "string",
                     "name": "name",
                     "key": "name"
                 },
                 {
                     "type": "mapping_value",
                     "name": "type",
                     "key": null
                 },
                 {
                     "type": "hstore_tags",
                     "name": "tags",
                     "key": "tags"
                 }
             ],
             "type": "polygon",
             "mapping": {
                 "amenity": [
                     "hospital"
                 ]
             }
         }
     }
 }

The imposm import command took nearly 19 hours to import the data into postgresql. You can push the data into the public schema using the -deploytoproduction flag or just use the import schema if you are comfortable with it.

Additionally, I exported the data from the osm_amenities table into a json file (a json line for every row in the table) using the following PSQL command:

COPY (SELECT row_to_json(t)
      FROM (SELECT id, osm_id, name, type,
                   hstore_to_json(tags) as tags,
                   ST_AsGeoJSON(geometry)::json as geometry
            FROM import.osm_amenities) t
     )
TO '/tmp/output.json';