Abstract

Remote sensing, or Earth Observation (EO), is increasingly used to understand Earth system dynamics and create continuous and categorical maps of biophysical properties and land cover, especially based on recent advances in machine learning (ML). ML models typically require large, spatially explicit training datasets to make accurate predictions. Training data (TD) are typically generated by digitizing polygons on high spatial-resolution imagery, by collecting in situ data, or by using pre-existing datasets. TD are often assumed to accurately represent the truth, but in practice almost always have error, stemming from (1) sample design, and (2) sample collection errors. The latter is particularly relevant for image-interpreted TD, an increasingly commonly used method due to its practicality and the increasing training sample size requirements of modern ML algorithms. TD errors can cause substantial errors in the maps created using ML algorithms, which may impact map use and interpretation. Despite these potential errors and their real-world consequences for map-based decisions, TD error is often not accounted for or reported in EO research. Here we review the current practices for collecting and handling TD. We identify the sources of TD error, and illustrate their impacts using several case studies representing different EO applications (infrastructure mapping, global surface flux estimates, and agricultural monitoring), and provide guidelines for minimizing and accounting for TD errors. To harmonize terminology, we distinguish TD from three other classes of data that should be used to create and assess ML models: training reference data, used to assess the quality of TD during data generation; validation data, used to iteratively improve models; and map reference data, used only for final accuracy assessment. We focus primarily on TD, but our advice is generally applicable to all four classes, and we ground our review in established best practices for map accuracy assessment literature. EO researchers should start by determining the tolerable levels of map error and appropriate error metrics. Next, TD error should be minimized during sample design by choosing a representative spatio-temporal collection strategy, by using spatially and temporally relevant imagery and ancillary data sources during TD creation, and by selecting a set of legend definitions supported by the data. Furthermore, TD error can be minimized during the collection of individual samples by using consensus-based collection strategies, by directly comparing interpreted training observations against expert-generated training reference data to derive TD error metrics, and by providing image interpreters with thorough application-specific training. We strongly advise that TD error is incorporated in model outputs, either directly in bias and variance estimates or, at a minimum, by documenting the sources and implications of error. TD should be fully documented and made available via an open TD repository, allowing others to replicate and assess its use. To guide researchers in this process, we propose three tiers of TD error accounting standards. Finally, we advise researchers to clearly communicate the magnitude and impacts of TD error on map outputs, with specific consideration given to the likely map audience.

Highlights

  • To gain further insight into the level of attention Training data (TD) receives in Earth Observation (EO) studies, we reviewed 30 top-ranked research papers published within the previous 10 years that describe land cover mapping studies

  • We divide the sources of TD error into two general classes: (1) errors stemming from the design of the training sample, including some aspects of sample and response design that are shared with standards for the collection of map reference data, and (2) errors made during the collection of the training sample, including additional elements of response design such as the process of digitizing and labeling points or polygons when interpreting imagery or when collecting field measurements

  • Current practices in EO research are generally inattentive to the need to evaluate and communicate the impact of TD error on machine learning (ML)-generated maps

Read more

Summary

Introduction

Recent technological advancements have led to a new era in Earth observation (EO, known as remote sensing), marked by rapid gains in our ability to map and measure features on the Earth’s surface such as land cover and land use (LCLU), e.g., [1,2], vegetation cover and abundance [3], soil moisture [4], infrastructure [5,6], vegetation phenology [7,8,9], land surface albedo [10,11,12], and land surface temperature [13,14]. The increasingly popular large-scale, high-complexity NNs require substantially more TD than traditional statistical models, and like many ML approaches are sensitive to noisy and biased data, producing the logistical difficulty of creating very large, “clean” training datasets [69,70,71] To address this need, several recent efforts have been devoted to producing extremely large training datasets that can be used across a wide range of mapping applications, and to serve as comprehensive benchmarks [72,73]. A recent trend has emerged in large-scale mapping projects to employ large teams of TD interpreters, often within citizen science campaigns that rely on web-based data creation tools [22,74,75,76]

Characterizing Training Data Error
Map Accuracy Assessment Procedures
Current Approaches for Assessing and Accounting for Training Data Error
Sources and Impacts of Training Data Error
Design-Related Errors
Collection-Related Errors
Impacts of Training Data Error
Incorporating Noisy Training Label Data
Detecting Roads from Satellite Imagery
Step 1
Step 2
Sample Design
Training Data Sources
Legend Design
Step 3
Communicating Error
Towards an Open Training Data Repository
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call