Smartphones, connected vehicles, drones, Earth-observation satellites, and billions of IoT sensors now stream precise coordinates 24 × 7. Analysts want to ask, “What happened here, why, and what will happen next?” Yet, the classic lat/lon columns backed by a B-tree grind to a halt once tables cross the hundred-million-row mark.
Spatial indexes such as Uber’s H3 hex grid and Microsoft’s quad-tile (quadkey) system assign every coordinate to a compact, hierarchical key. Those keys behave like ordinary strings/ints, so they can be range-scanned, sharded, and joined at warehouse scale—turning what used to be geometry math into nothing more exotic than group-by and left join.
Before diving into the indexes, let’s ground ourselves in the analytics patterns you’ll actually run:
Pipelines usually unfold as:
In summary, spatial indexing is the fulcrum that lets you swap geometry math for plain SQL—so the same engine that crunches sales data can suddenly crunch street networks too.
H3 projects an icosahedron onto the sphere, then recursively subdivides each face into hexagons (plus 12 unavoidable pentagons). Resolution 0 hexes are ~1,100 km across; by res 15 they shrink to ~75 cm. The 64-bit key encodes face → child path → resolution, so every cell instantly knows its parent and six neighbors.
Equal-area-ish footprints make statistics fair, and constant-degree adjacency makes regional roll-ups trivial.
import h3, geopandas as gpd
gdf = gpd.read_file("my_points.geojson")
gdf["h3_res7"] = gdf.geometry.apply(
lambda p: h3.geo_to_h3(p.y, p.x, 7)
)
A simple `GROUP BY h3_res7`
now bins millions of pings into an instant heat-map.
Quadkeys start with the familiar slippy-map pyramid: world tile at zoom 0, each deeper zoom quadruples resolution. Concatenate child numbers (0-3) on the descent to produce a quadkey like `023112`
. The scheme inherits Web Mercator’s distortion but matches every browser, CDN, and game engine on Earth.
from math import tan, pi, log, floor, sinh
def latlon_to_quadkey(lat, lon, zoom):
siny = sinh(lat * pi / 180)
x = (lon + 180) / 360
y = 0.5 - log((1 + siny) / (1 - siny)) / (4 * pi)
n = 1 << zoom
tx, ty = floor(x * n), floor(y * n)
qk = ""
for i in range(zoom, 0, -1):
bit, mask = 0, 1 << (i - 1)
if tx & mask: bit += 1
if ty & mask: bit += 2
qk += str(bit)
return qk
Feed `zoom=15`
and you get a deterministic CDN file path like `/tiles/023112231012130.png`
.
quadkey.png`
tiles let CDNs serve billions of images with zero DB hits.
WITH q AS (
SELECT '023112231012130' AS quadkey
),
b AS (
SELECT ST_GeogFromQuadKey(quadkey) AS geom
FROM q
)
SELECT cell_id
FROM b, UNNEST(H3_POLYFILL(geom, 9)) AS cell_id;
Reverse the flow by exploding each H3 cell to `ST_GEOGFROMH3(cell)`
and testing `ST_WITHIN`
against tile bounds.
h3_to_parent(res=4)`
or the first 3 chars of the quadkey.ST_GEOGFROMQUADKEY`
make cross-walks one-liners.
TileLayer`
for quadkeys; add `h3-js`
for client-side hex overlays.h3`
column for instant 3-D extruded hex-bins.
(x,y,z,t)`
cubes for LiDAR, weather, and CFD modeling—active research in environmental modeling.
Over the last decade the geo-data firehose has forced us to treat where as a first-class analytic dimension—no different from when or who. The post walked through two indexing “dialects” that let you do exactly that:
Guidelines for your next build:
Treat H3 and quadkeys not as competing fads but as complementary building blocks. Once every record in your stack carries a spatial key, you unlock earth-scale insight at SQL speed, and suddenly, asking “What happened here?” is as cheap as any ordinary filter.