Accounting for Training Data Error in Machine Learning Applied to Earth Observations

Arthur Elmes ,Juan Carlos Laso Bayas ,Seyed Hamed Alemohammad ,Lewis Fishgold ,Andrew B Reinmann ,Zhuang‐Fang Yi ,J Ronald Eastman ,Lei Song ,Divyani Kohli ,Meha Jain ,M A Friedl ,Hristiana Stoynova ,Dalton Lunga ,Ryan Avery ,John Rogan ,J L Mccarty ,K K Caylor ,Robert Gilmore Pontius ,Ye Su ,Lyndon Estes

doi:10.3390/rs12061034

Abstract

Remote sensing, or Earth Observation (EO), is increasingly used to understand Earth system dynamics and create continuous and categorical maps of biophysical properties and land cover, especially based on recent advances in machine learning (ML). ML models typically require large, spatially explicit training datasets to make accurate predictions. Training data (TD) are typically generated by digitizing polygons on high spatial-resolution imagery, by collecting in situ data, or by using pre-existing datasets. TD are often assumed to accurately represent the truth, but in practice almost always have error, stemming from (1) sample design, and (2) sample collection errors. The latter is particularly relevant for image-interpreted TD, an increasingly commonly used method due to its practicality and the increasing training sample size requirements of modern ML algorithms. TD errors can cause substantial errors in the maps created using ML algorithms, which may impact map use and interpretation. Despite these potential errors and their real-world consequences for map-based decisions, TD error is often not accounted for or reported in EO research. Here we review the current practices for collecting and handling TD. We identify the sources of TD error, and illustrate their impacts using several case studies representing different EO applications (infrastructure mapping, global surface flux estimates, and agricultural monitoring), and provide guidelines for minimizing and accounting for TD errors. To harmonize terminology, we distinguish TD from three other classes of data that should be used to create and assess ML models: training reference data, used to assess the quality of TD during data generation; validation data, used to iteratively improve models; and map reference data, used only for final accuracy assessment. We focus primarily on TD, but our advice is generally applicable to all four classes, and we ground our review in established best practices for map accuracy assessment literature. EO researchers should start by determining the tolerable levels of map error and appropriate error metrics. Next, TD error should be minimized during sample design by choosing a representative spatio-temporal collection strategy, by using spatially and temporally relevant imagery and ancillary data sources during TD creation, and by selecting a set of legend definitions supported by the data. Furthermore, TD error can be minimized during the collection of individual samples by using consensus-based collection strategies, by directly comparing interpreted training observations against expert-generated training reference data to derive TD error metrics, and by providing image interpreters with thorough application-specific training. We strongly advise that TD error is incorporated in model outputs, either directly in bias and variance estimates or, at a minimum, by documenting the sources and implications of error. TD should be fully documented and made available via an open TD repository, allowing others to replicate and assess its use. To guide researchers in this process, we propose three tiers of TD error accounting standards. Finally, we advise researchers to clearly communicate the magnitude and impacts of TD error on map outputs, with specific consideration given to the likely map audience.

Highlights

To gain further insight into the level of attention Training data (TD) receives in Earth Observation (EO) studies, we reviewed 30 top-ranked research papers published within the previous 10 years that describe land cover mapping studies
We divide the sources of TD error into two general classes: (1) errors stemming from the design of the training sample, including some aspects of sample and response design that are shared with standards for the collection of map reference data, and (2) errors made during the collection of the training sample, including additional elements of response design such as the process of digitizing and labeling points or polygons when interpreting imagery or when collecting field measurements
Current practices in EO research are generally inattentive to the need to evaluate and communicate the impact of TD error on machine learning (ML)-generated maps

Summary

Introduction

Recent technological advancements have led to a new era in Earth observation (EO, known as remote sensing), marked by rapid gains in our ability to map and measure features on the Earth’s surface such as land cover and land use (LCLU), e.g., [1,2], vegetation cover and abundance [3], soil moisture [4], infrastructure [5,6], vegetation phenology [7,8,9], land surface albedo [10,11,12], and land surface temperature [13,14]. The increasingly popular large-scale, high-complexity NNs require substantially more TD than traditional statistical models, and like many ML approaches are sensitive to noisy and biased data, producing the logistical difficulty of creating very large, “clean” training datasets [69,70,71] To address this need, several recent efforts have been devoted to producing extremely large training datasets that can be used across a wide range of mapping applications, and to serve as comprehensive benchmarks [72,73]. A recent trend has emerged in large-scale mapping projects to employ large teams of TD interpreters, often within citizen science campaigns that rely on web-based data creation tools [22,74,75,76]

Characterizing Training Data Error

Map Accuracy Assessment Procedures

Current Approaches for Assessing and Accounting for Training Data Error

Sources and Impacts of Training Data Error

Design-Related Errors

Collection-Related Errors

Impacts of Training Data Error

Incorporating Noisy Training Label Data

Detecting Roads from Satellite Imagery

Step 1

Step 2

Sample Design

Training Data Sources

Legend Design

Step 3

Communicating Error

Towards an Open Training Data Repository

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Remote sensing	Publication Date: Mar 23, 2020
Citations: 53	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accounting for Training Data Error in Machine Learning Applied to Earth Observations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote sensing

Lead the way for us

Similar Papers

Artificial Intelligence and Machine Learning: What You Always Wanted to Know but Were Afraid to Ask
Puru Rattan ... Daniel D Penrice
Gastro hep advances | VOL. 1
Puru Rattan, et. al.Puru Rattan ... Daniel D Penrice
01 Jan 2021
Gastro hep advances | VOL. 1

Machine learning approaches for formation matrix volume prediction from well logs: Insights and lessons learned
Pamidi Venkata Durga Kannaiah ... Neetish Kumar Maurya
Geoenergy Science and Engineering | VOL. 229
Pamidi Venkata Durga Kannaiah, et. al.Pamidi Venkata Durga Kannaiah ... Neetish Kumar Maurya
08 Jul 2023
Geoenergy Science and Engineering | VOL. 229

Replicating Machine Learning Experiments in Materials Science
Line Pouchard ... Yuewei Lin
-
Line Pouchard, et. al.Line Pouchard ... Yuewei Lin
20 Mar 2020
20 Mar 2020

Pushing the limits of solubility prediction via quality-oriented data selection.
Murat Cihan Sorkun ... J.M. Vianney A. Koelman
iScience | VOL. 24
Murat Cihan Sorkun, et. al.Murat Cihan Sorkun ... J.M. Vianney A. Koelman
17 Dec 2020
iScience | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accounting for Training Data Error in Machine Learning Applied to Earth Observations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote sensing