PMTiles version 3: Hilbert Tile IDs and Run-Length Encoding
PMTiles is a single-file archive format for pyramids of tile data, like those used to power interactive maps. It's designed to make storage and serving of planet-scale tiled maps simple using only affordable S3-like storage and HTTP Range Requests.
The physical world depicted in maps is infinitely detailed, but also not random like white noise. By exploiting redundancy in the physical world, we can make specialized formats for GIS more compact and efficient.
The universe is sparse at all scales: dense clusters of unique features, with wide swaths of similarity
One step towards this in the first public version of PMTiles (specification version 2) was tile deduplication. The set of all OpenStreetMap PNG tiles will have many references to the same blue square representing the ocean.
Deploying PMTiles v2 for further real-world use cases and datasets has resulted in a few refinements to the design, the first of which is related to the internal tile addressing format for version 3. The new specification is being finalized and discussed on GitHub in time for the FOSS4G conference happening this year in Firenze, Italy.
Old vs. New Design
Refresher: the existing
Entry record in PMTiles version 2 looks like this, inspired by the popular MBTiles SQLite-based format:
uint8 Z uint24 X uint24 Y uint48 Offset uint32 Length
Example: the entry
Z=8 X=40 Y=87 points to byte range
offset=4240340 length=32836 in a PMTiles file, which is a 32.7 kB PNG image near Vancouver, British Columbia:
The v3 design simplifies the
Entry struct to fewer and more standard column types:
uint64 TileId uint32 RunLength uint64 Offset uint32 Length
The equivalent of the above in the new design would be
TileId=36052 RunLength=1 Offset=4240340 Length=32836.
TileId 36052 corresponds to the Z,X,Y position of
8,40,87. The calculation of ID uses a pyramid of Hilbert curves starting at
TileId=0 for zoom level 0. The next zoom level, a 2x2 square, occupies the next four IDs in the ID space
TileId=(1,2,3,4), the next level being the next 16 IDs, and so on.
In the example British Columbia tile above, the
Z,X,Y position of
8,40,87 corresponds to the 14207th Hilbert curve position on level 8, which starts at
PMTiles v3 replaces
Z,X,Y with the single
TileID field, and also adds a new field
The previous PMTiles design deduplicates repetitive tiles like blue ocean squares via multiple entries pointing to the same data. However, it can repeat the same entry millions of times for large regions of oceans, which cover 70% of the planet.
Ocean tiles are not only repetitive, but sparse and often contiguous in Hilbert space. This entry:
means that the 44 byte vector tile with a single square in the layer
ocean is repeated over 100,000 times, starting at
Z,X,Y=11,285,1311 and ending at
11,19,1304. This single contiguous sequence is represented by the yellow-green gradient in the planet image below:
What previously required 107,977 separate directory entries, or 1.8 megabytes of storage, can now be repesented by a single 24 byte entry. This specific case of a 75,000x compression ratio is useful to minimize storage costs and transfer latency of map tiles, which can help developers build and deploy faster, more flexible maps.
The next post will address the design of compressed directories in PMTiles version 3.
Contiguous Hilbert runs