OSM Express is a database file format for OpenStreetMap data (.osmx), as well as a command line tool and C++ library for reading and writing .osmx files.
Illustration of the cell covering for a rectangular input region and its overlap with indexed OpenStreetMap geometries.
Here are some use cases that OSM Express fits well.
nametags for a given way that represents a building, and construct geometries for ways and relations.
Binaries are available for MacOS (Darwin) and GNU/Linux at GitHub Releases.
For information on how to compile the
osmx program from source, see the Programming Guide.
Once you have the
osmx command line program, you'll need to start with an .osm.pbf or OSM XML file. The Planet file is available at planet.openstreetmap.org, but it's preferable to begin with something smaller to learn with.
There are numerous sites for downloading .osm.pbf extracts, including Protomaps Minutely Extracts, a service itself powered by OSM Express. For testing purposes let's start with this small PBF I generated of New York County:
Create an .osmx file by using the
expand command on the .osm.pbf file:
osmx expand new_york_county.osm.pbf new_york_county.osmx
This will result in a 91 MB .osmx file.
We can access objects inside this .osmx file by ID, displaying the node IDs of its member nodes and all tags:
osmx query new_york_county.osmx way 34633854 > 402743563 402743567 402743571 402743573 2709307502 2709307499 2709307464 402743563 addr:city=New York City addr:housenumber=350 addr:postcode=10018 ...
We can also extract regions of the .osmx file into a new .osm.pbf file, which is useful for interoperability with other OSM tools.
osmx extract new_york_county.osmx downtown.osm.pbf --bbox 40.7411\,-73.9937\,40.7486\,-73.9821
the OSM Express library is intentionally minimal and non-opinionated - for example, no attempt is made to transform OSM tags to a fixed schema, distinguish between polygon and linear ways, or assemble multipolygon relations into polygons. For these typical tasks it's recommended to use OSM Express as a library in your own program. Documentation and example code are available at the Programming Guide.
An .osmx file can be opened and queried direcly in a Python program using the
osmx Python package. See Python for details.
Languages other than Python may be supported in the future by either language-specific libraries or a new C API. See Development if you're interested or discuss on GitHub.
A full planet.osmx created from planet.osm.pbf (47 GB) is around 580 GB.
OSM Express is optimized for fast lookups, extracts and updates, goals opposed to making the database size as compact as possible. A typical .osmx file can be 10 times the size of the corresponding .osm.pbf, because:
mmap-based design of LMDB and Cap'n Proto requires that fields are word-aligned on disk, causing storage overhead.
As of 2019, fast local storage is cheap; 1 terabyte solid state drives are less than 150 USD. On managed hosting providers like AWS and Google Cloud, extra storage is affordable compared to more memory or CPU cores.
If it's necessary to optimize for storage space, an .osmx file can be stored on a filesystem with transparent compression such as ZFS or Btrfs, at the cost of CPU overhead. This can reduce planet.osmx to around 200GB.
OSM Express stores all metadata - version, timestamp, changeset, username and user ID - for all OSM objects, except for untagged nodes. The
--noUserData flag ignores changeset, username and user ID information for extracts, to comply with GDPR guidelines.
OSM Express should work with reasonable amounts of memory, less than 8 gigabytes, even for
extract on planet.osmx. The strongest predictor of performance is I/O latency. If benchmarking different storage environments, I/O latency can be best measured via IOPS at queue depth 1.
osmx query command with no arguments reveals the layout of an .osmx database:
osmx query planet.osmx locations: 5313351219 nodes: 144307630 ways: 590470034 relations: 6895065 cell_node: 5313351219 node_way: 5906888644 node_relation: 10242142 way_relation: 63350432 relation_relation: 497137
an .osmx file is a LMDB database with 10 sub-databases. All keys are 64 bit integers.
locations: maps a node ID to a 64-bit location, with 32 bits for each of lat, lon.
relationsmap object IDs to a Cap'n Proto message as described in
cell_nodemaps a level 16 S2 cell to a node ID, using LMDB DUPSORT (sorted duplicate keys).
relation_relationmap object IDs to its parent objects, also using DUPSORT.
metadata sub-database holds arbitrary string:string values. This is used to store the replication sequence number and timestamp.
It is important to note that LMDB transactions span all sub-databases. This means that a read operation will retrieve the correct
timestamp for the data it fetches, even if the database is written to while the read is happening.
OSM Express avoids expensive point-in-polygon computations for spatial operations. Instead, a query region is approximated by S2 cells with maximum level 16. The level 16 is chosen as a reasonable tradeoff between covering precision and storage space.
Author's note: the S2 Covering of a region may differ depending on choice of architecture and compiler, while still being valid. Let me know if you know how to make this consistent.
If you'd like to sponsor development of OSM Express features, or integrate it into your product, get in contact at [email protected].