The Linux Foundation Projects
Skip to main content

Data Extraction Community Guideline

Community Guidelines for submitting map data resulting from automated data extraction from the internet

Overture Maps Foundation is building open map data using a wide variety of data sources. Map data is the digital representation of the physical world and our goal is that digital representation be as accurate and up-to-date as possible.

Map related data derived from automated web data extraction methods (“Automated Extraction”) is one of several valuable sources. To maintain a respectful environment which strives for both permissively licensed data and high quality of that data, we have established these guidelines to outline the conditions under which we will accept map data resulting from Automated Extraction.

Definition: Automated Extraction from the World Wide Web refers to the systematic and programmatic retrieval of information. It encompasses both web scraping, which focuses on extracting specific data from individual web pages, and web crawling, which involves automated navigation to collect and index content from multiple pages or websites. These techniques utilize automation to efficiently gather, organize, and analyze data for various purposes such as content indexing, data analysis, and monitoring changes on the web.

  1. Map data for publicly observable spaces

Overture Maps Foundation focuses on map data about physical objects that can be publicly observed and are relevant to map applications. Our goal is to collect high quality data that reflects the physical world as it changes. We do not collect or publish data about non-public spaces or objects. If something is not observable from a publicly accessible area, it does not fit in Overture’s datasets.

  1. Global Entity Reference System (GERS)

Entities in Overture map data (for example, streets, buildings, places) contain a persistent identifier (called a GERS ID) that unambiguously identifies that entity. Data from Automated Extraction will be matched to a GERS ID (if the input data is not already matched) and used to create, improve or confirm the map data about an entity.

  1. Compliance with Legal and Ethical Standards:

We emphasize the importance of adhering to applicable laws and ethical standards when engaging in Automated Extraction activities to collect map data. Contributors must ensure that their actions comply with local, national, and international regulations, respecting intellectual property rights, privacy laws, and other legal considerations.

  1. Multiple Source Verification:

Map data obtained through Automated Extraction should be cross-referenced with reputable sources to verify its accuracy.

  1. Transparency and Openness:

Contributors are encouraged to be transparent about their Automated Extraction operations. Users must not disguise the source or obfuscate the Automated Extraction tool. Provide clear documentation on the sources, methods, and frequency of data collection to foster trust within the community. Transparency promotes accountability and allows others to understand the context of the contributed map data.

  1. Respect for Terms of Service and Compliance with Protocols:

Map data obtained through Automated Extraction should be collected in accordance with the terms of service of the websites from which it originates. Any Automated Extraction tools must follow the Robots Exclusion Protocol (The Web Robots Pages (robotstxt.org)) and the Sitemaps Protocol (sitemaps.org – Home).

  1. License:

All submitted data must be provided under a permissive data license that is, or is compatible with, the CDLA Permissive v2.0 license (Community Data License Agreement – Permissive, Version 2.0 – CDLA).

  1. User Privacy Protection:

Overture does not want any data that includes Personally Identifiable Information (PII). Please make efforts to exclude personally identifiable information from data submissions.  If PII is discovered within the map data, Overture Maps Foundation has a data takedown policy that can help address the removal of PII upon request.

  1. Compliance Review and Enforcement:

The organization reserves the right to review contributed map data for compliance with these guidelines. Violations may result in the removal of the data and, in severe cases, suspension or expulsion of the contributor from the community. Enforcement actions will be taken with fairness and transparency.

Conclusion:

By following these guidelines, we aim to create a collaborative and ethical space for sharing map data obtained through Automated Extraction. Your contributions play a vital role in building a valuable resource for the community. Thank you for your commitment to responsible data sharing and community collaboration.