What is a clustered PMTiles archive?

Jan 17, 2024

One detail in the PMTiles version 3 specification is a boolean flag called clustered. Popular tools like the pmtiles CLI, tippecanoe and Planetiler always create clustered archives. PMTiles is an open specification in the public domain, so this post is to aid developers in implementing this optional feature.

Clustered

The term clustered comes from relational databases: the PostgreSQL manual defines that “When a table is clustered, it is physically reordered based on the index information.”

A PostgreSQL index may be created over any column or set of columns; the directories of a PMTiles archive are the analogue of a database index, where the lookup key is the Hilbert TileID. Setting the clustered property of a PMTiles archive means that the ordering of the tile data on disk matches the directories. Since directories must be sorted by Hilbert TileID, the tiles must also be in Hilbert order.

De-duplication

Ordering the tile contents in PMTiles has one small catch - tiles are allowed to be de-duplicated. This is essential for representing the ocean on basemaps, otherwise 70% of the archive would be repetitive water tiles.

A toy example of a tileset with 4 tiles of length=100, 2 of which are the same tile contents:

                                   ┌───┐┌───┐
TileID=1, offset=0,                │ A ││ O │
TileID=2, offset=200 (ocean tile)  └───┘└───┘
TileID=3, offset=100               ┌───┐┌───┐
TileID=4, offset=200 (ocean tile)  │ O ││ B │                                
                                   └───┘└───┘

Because the tile with offset=200 appears twice, it’s ambiguous exactly where it should go relative to other tiles. To resolve this, the PMTiles spec defines clustered as only having backward references to lower offsets in the file, never jumping forward to a higher offset.

                                   ┌───┐┌───┐
TileID=1, offset=0,                │ A ││ O │
TileID=2, offset=100 (ocean tile)  └───┘└───┘
TileID=3, offset=200               ┌───┐┌───┐
TileID=4, offset=100 (ocean tile)  │ O ││ B │                                
                                   └───┘└───┘

A properly clustered archive has the first tile contents at offset 0, and all unique tile contents are contiguous for the entire file.

Locality

The historical motivation for clustering database tables was to improve locality. When most servers used magnetic hard drives, disk access to nearby parts of a file was much faster than far-away parts. PMTiles is designed to work with storage systems like S3: cloud blob store APIs don’t exhibit a huge difference in latency depending on data locality. But if you’re accessing a .pmtiles on disk, or on a magnetic hard drive, the seek time should be lower for a typical task like requesting multiple tiles in one area.

Directory delta encoding

A planet-scale tileset like the Protomaps basemap addresses over one billion tiles. Storing this directory using a naive encoding like TileID(uint64),Offset(uint64),Length(uint64) is about 20 gigabytes of raw data. Since the map user can zoom and pan anywhere on the map, and parts of the directory must be fetched on-demand, it’s key to make this encoding as small as possible. The directory encoding delta-encodes TileIDs; if an archive is clustered, it can also effectively delta-encode offsets, since entries are contiguous (excluding the small % of de-duplicated tiles).

Another way to view this optimization is as a compromise between sparse and dense directories. A sparse representation of the directory is an associative map between TileID and offset+length:

{
  1:[0,100],
  2:[100,100],
  3:[200,100],
  4:[100,100]
}

On the other hand, a dense representation of the directory is an array, where the index in the array is the TileID:

[[0,100],[100,100],[200,100],[100,100]]

Whether a sparse or dense data structure is optimal depends on the data being stored. A Canada wildfires tileset may be widely dispersed, where a sparse structure is better; or they might be OSM basemaps for a populous city that blanket an entire area of continuous tiles. A dense array-like format would ideal, but using it for wildfires means storing many directories full of zero or null entries.

PMTiles uses a sparse directory layout, but by delta-encoding clustered entries, the on-disk bytes are much like a dense layout where only the offset is stored.

Determinism

A more significant advantage of clustered archives is that they make archive layout deterministic: running a program on the same input twice should produce byte-for-byte identical output. Storage engines like SQLite, used in MBTiles, can create different output bytes based on order of operations or library version. Tileset creation is computationally expensive, so programs generate tiles on parallel CPU threads and may finish in indeterminate order. By defining a single, canonical ordering of tiles, detecting data changes in PMTiles works with checksums like MD5, and secure verification of outputs is possible with SHA-256 or Blake3.

Mike Barry has recently enhanced Planetiler’s determinism by ensuring vector tile contents like polygon ring ordering and tag ordering are consistent between runs.

pmtiles verify

The pmtiles CLI now includes a verify operation that validates the contiguous Hilbert ordering of all tile contents, with no forward references.

Recommendations

If your only use case is deploying a PMTiles file on S3 for web map visualization, clustered archives aren’t essential: they only minimize the space taken up by the directory.

Clustered archives are mandatory for more advanced operations on PMTiles, such as extracting a subset of tiles from a larger archive, which will be the subject of a future post.