The Linux Foundation Projects
Skip to main content

In today’s data-rich world, the geospatial community faces unique challenges in storing, processing, and sharing this data at scale. As mapping datasets grow in complexity and size, traditional file formats struggle to keep pace with modern cloud environments and distributed computing needs.

Overture uses GeoParquet as a core format for our datasets. We sat down with Overture’s CTO, Amy Rose, to dig into what this decision means for developers and why GeoParquet is becoming such a game-changer for working with geospatial data.

What is GeoParquet?

GeoParquet is an extension of Apache Parquet, a columnar file format that has existed since 2013. The “Geo” part is an extension that provides geospatial data types to use in conjunction with the existing Parquet format.

This open standard, developed by the community, brings geospatial data into the mainstream of big data processing by extending an established format rather than creating yet another specialized GIS format.

Breaking Down Data Silos

For too long, geospatial data has lived in its own world, separate from mainstream data processing. This split happened because location data needs special handling — not just storing points, lines, and polygons, but managing coordinate systems, spatial relationships, and specialized indexing for efficient querying.

GeoParquet helps fix this problem by treating geospatial data as just another type of data that can work with standard tools. Amy points out why this matters: “The more that we start to think about it as just another data type, the more acceptance and adoption into workflows and analytics it will have in the mainstream. More and more organizations can take advantage of location data and location intelligence.”

This approach means you can use familiar tools for both spatial and non-spatial data instead of needing separate systems for each.

Why Overture Chose GeoParquet

When Overture Maps Foundation was getting started, we needed to pick a format that matched our goals of making mapping data accessible and easy to use.  “We’re trying to make the data easier to use and integrate — not just for GIS specialists but for all kinds of developers who want to work with spatial data,” Amy explained, “That means choosing formats that are open, efficient, and compatible with modern cloud workflows.”

GeoParquet stood out for several practical reasons

1. Works with tools you already use

GeoParquet builds on the Parquet format, so it connects naturally with cloud platforms and data processing tools. This compatibility means developers can work with Overture’s data using the same systems and workflows they already rely on — no new specialized tools required.

2. Handles large map datasets well

Overture’s global-scale mapping data involves massive and continuously growing datasets. GeoParquet’s columnar structure and optimization for big data querying make it a natural fit for efficiently handling high volumes of geospatial information, both for storage and for fast access.

3. You only need to grab the data you want

Instead of downloading gigabytes of data just to look at one neighborhood, GeoParquet lets developers access exactly the slices they need. This support for selective reads dramatically improves efficiency for developers and data scientists working at any scale.

The Early Days and Community Growth

When Overture launched in 2022, GeoParquet was still a draft specification with limited tooling and adoption. This forward-looking choice wasn’t without challenges. Many early users were unfamiliar with the format and lacked the tools to work with it effectively. 

However, Overture recognized that the long-term benefits far outweighed the short-term friction, and tooling around GeoParquet has developed rapidly, fueled by the open source community and the availability of Overture’s data as a practical test case. The ecosystem around GeoParquet has grown much faster than initially anticipated, demonstrating the power of open source and community-driven development.

Helping Developers Get Started

To make GeoParquet easier to work with, we’ve created resources to help developers of all backgrounds: We’ve built guides and tutorials to address common use cases and integration paths.

Our resource library continues to grow and includes practical guides for:

  • Accessing Overture data through AWS Athena or Azure Synapse
  • Running queries in DuckDB
  • Visualizing data in Kepler.gl
  • Using Overture data in desktop GIS tools like QGIS

These resources bridge the gap between traditional geospatial workflows and cloud-native approaches, ensuring that all developers — regardless of their background — can successfully work with Overture’s data.

Cloud Native + GERS: Why They Work Together

GeoParquet’s cloud-native design changes how developers can work with geospatial data. Instead of downloading entire datasets to extract specific information, you can query just the pieces you need directly in the cloud. This prevents the common problem of working with outdated local copies and makes the whole system more interoperable.

At the same time, GERS brings a critical new layer to the geospatial ecosystem: a way to assign stable, unique, persistent IDs to real-world features like buildings, roads, and places. Think of it as a reference map that anchors what exists in the physical world to digital representations that everyone can refer to using the same identifiers.

GERS (Global Entity Reference System) and GeoParquet complement each other in powerful ways. When we deliver GERS through cloud-native formats like GeoParquet, the benefits multiply. Updates to the reference map are immediately available to everyone. You don’t need to download new versions or process local copies – you can connect directly to the latest data. As Amy Rose explains, this creates “a persistent single source of truth that people can integrate directly into their workflow.”

Think of GERS as providing stable IDs that everyone can reference, while GeoParquet provides the efficient, cloud-based way to access and work with that data. Together, they’re helping to create a more connected geospatial data ecosystem where different datasets can talk to each other through shared identifiers.

See It in Action CNG Conference 2025

Want to see how it works?  Join us at the Cloud Native Geospatial (CNG) Conference 2025, April 30 – May 2, where the Overture Maps team will showcase our cloud-native approach through a series of talks, workshops, and panels.

Don’t Miss Our Workshop

Interfacing with Cloud-Native Overture Data and the GERS Ecosystem

Wednesday, April 30 |  1:15 PM – 2:45 PM

A hands-on session exploring Overture’s GeoParquet datasets and the Global Entity Reference System (GERS), including real-world examples of how developers and GIS pros are using these tools today.

We’re also leading talks and panels throughout the week—check out the full schedule and meet the team:

View the complete schedule and register for CNG Conference 2025