Published October 18, 2024 | Version v1
Dataset Open

Integrated Approach to Global Land Use and Land Cover Reference Data Harmonization

  • 1. Instituto Mauro Borges de Estatísticas e Estudos Socioeconômicos (IMB)
  • 2. Laboratório de Processamento de Imagens e Geoprocessamento (LAPIG)
  • 3. OpenGeoHub Foundation

Description

INTRODUCTION

This document outlines the creation of a global inventory of reference samples and Earth Observation (EO) / gridded datasets for the Global Pasture Watch (GPW) initiative. This inventory supports the training and validation of machine-learning models for GPW grassland mapping. This documentation outlines methodology, data sources, workflow, and results.

Keywords: Grassland, Land Use, Land Cover, Gridded Datasets, Harmonization

 

OBJECTIVES

  • Create a global inventory of existing reference samples for land use and land cover (LULC);

  • Compile global EO / gridded datasets that capture LULC classes and harmonize them to match the GPW classes;

  • Develop automated scripts for data harmonization and integration.

 

DATA COLLECTION 

Datasets incorporated:

Datasets

Spatial distribution

Time period Number of individual samples WorldCereal Global 2016-2021 38,267,911 Global Land Cover Mapping and Estimation (GLanCE) Global 1985-2021 31,061,694 EuroCrops Europe 2015-2022 14,742,648 GeoWiki G-GLOPS training dataset Global 2021 11,394,623 MapBiomas Brazil Brazil 1985-2018 3,234,370 Land Use/Land Cover
Area Frame Survey (LUCAS) Europe 2006-2018 1,351,293 Dynamic World Global 2019-2020 1,249,983 Land Change Monitoring,
Assessment, and Projection (LCMap) U.S. (CONUS) 1984-2018 874,836 GeoWiki 2012 Global 2011-2012 151,942 PREDICTS Global 1984-2013 16,627 CropHarvest Global 2018-2021 9,714

Total: 102,355,642 samples

 

WORKFLOW

Harmonization Process

We harmonized global reference samples and EO/gridded datasets to align with GPW classes, optimizing their integration into the GPW machine-learning workflow.

We considered reference samples derived by visual interpretation with spatial support of at least 30 m (Landsat and Sentinel), that could represent LULC classes for a point or region.

Each dataset was processed using automated Python scripts to download vector files and convert the original LULC classes into the following GPW classes:

       0. Other land cover

       1. Natural and Semi-natural grassland

       2. Cultivated grassland

       3. Crops and other related agricultural practices

We empirically assigned a weight to each sample based on the original dataset's class description, reflecting the level of mixture within the class. The weights range from 1 (Low) to 3 (High), with higher weights indicating greater mixture. Samples with low mixture levels are more accurate and effective for differentiating typologies and for validation purposes.

The harmonized dataset includes these columns:

Attribute Name Definition dataset_name Original dataset name reference_year Reference year of samples from the original dataset original_lulc_class LULC class from the original dataset gpw_lulc_class Global Pasture Watch LULC class sample_weight Sample's weight based on the mixture level within the original LULC class

 

ACKNOWLEDGMENTS

The development of this global inventory of reference samples and EO/gridded datasets relied on valuable contributions from various sources. We would like to express our sincere gratitude to the creators and maintainers of all datasets used in this project.

 

REFERENCES

  • Brown, C.F., Brumby, S.P., Guzder-Williams, B. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci Data 9, 251 (2022). https://doi.org/10.1038/s41597-022-01307-4Van Tricht, K. et al. Worldcereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth Syst. Sci. Data 15, 5491–5515, 10.5194/essd-15-5491-2023 (2023)

  • Buchhorn, M.; Smets, B.; Bertels, L.; De Roo, B.; Lesiv, M.; Tsendbazar, N.E., Linlin, L., Tarko, A. (2020): Copernicus Global Land Service: Land Cover 100m: Version 3 Globe 2015-2019: Product User Manual; Zenodo, Geneve, Switzerland, September 2020; doi: 10.5281/zenodo.3938963

  • d’Andrimont, R. et al. Harmonised lucas in-situ land cover and use database for field surveys from 2006 to 2018 in the european union. Sci. data 7, 352, 10.1038/s41597-019-0340-y (2020)

  • Fritz, S. et al. Geo-Wiki: An online platform for improving global land cover, Environmental Modelling & Software, 31, https://doi.org/10.1016/j.envsoft.2011.11.015 (2012)

  • Fritz, S., See, L., Perger, C. et al. A global dataset of crowdsourced land cover and land use reference data. Sci Data 4, 170075 https://doi.org/10.1038/sdata.2017.75 (2017)

  • Schneider, M., Schelte, T., Schmitz, F. & Körner, M. Eurocrops: The largest harmonized open crop dataset across the european union. Sci. Data 10, 612, 10.1038/s41597-023-02517-0 (2023)

  • Souza, C. M. et al. Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine. Remote. Sens. 12, 2735, 10.3390/rs12172735 (2020)

  • Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10, 879 (2023) 

  • Stehman, S. V., Pengra, B. W., Horton, J. A. & Wellington, D. F. Validation of the us geological survey’s land change monitoring, assessment and projection (lcmap) collection 1.0 annual land cover products 1985–2017. Remot Sensing environment 265, 112646, 10.1016/j.rse.2021.112646 (2021).
  • Tsendbazar, N. et al. Product validation report (d12-pvr) v 1.1 (2021).

  • Tseng, G., Zvonkov, I., Nakalembe, C. L., & Kerner, H. (2021). CropHarvest: A global dataset for crop-type classification. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

Remote files

Additional details

See also

Created:
March 21, 2025
Modified:
March 21, 2025