Abstract

The current exponential increase of spatiotemporally explicit data streams from satellitebased Earth observation missions offers promising opportunities for global vegetation monitoring. Intelligent sampling through active learning (AL) heuristics provides a pathway for fast inference of essential vegetation variables by means of hybrid retrieval approaches, i.e., machine learning regression algorithms trained by radiative transfer model (RTM) simulations. In this study we summarize AL theory and perform a brief systematic literature survey about AL heuristics used in the context of Earth observation regression problems over terrestrial targets. Across all relevant studies it appeared that: (i) retrieval accuracy of AL-optimized training data sets outperformed models trained over large randomly sampled data sets, and (ii) Euclidean distance-based (EBD) diversity method tends to be the most efficient AL technique in terms of accuracy and computational demand. Additionally, a case study is presented based on experimental data employing both uncertainty and diversity AL criteria. Hereby, a a simulated training data base by the PROSAIL-PRO canopy RTM is used to demonstrate the benefit of AL techniques for the estimation of total leaf carotenoid content (Cxc) and leaf water content (Cw). Gaussian process regression (GPR) was incorporated to minimize and optimize the training data set with AL. Training the GPR algorithm on optimally AL-based sampled data sets led to improved variable retrievals compared to training on full data pools, which is further demonstrated on a mapping example. From these findings we can recommend the use of AL-based sub-sampling procedures to select the most informative samples out of large training data pools. This will not only optimize regression accuracy due to exclusion of redundant information, but also speed up processing time and reduce final model size of kernel-based machine learning regression algorithms, such as GPR. With this study we want to encourage further testing and implementation of AL sampling methods for hybrid retrieval workflows. AL can contribute to the solution of regression problems within the framework of operational vegetation monitoring using satellite imaging spectroscopy data, and may strongly facilitate data processing for cloud-computing platforms.

Highlights

  • Following the promising results obtained by the selected studies, we demonstrate the efficiency of active learning (AL) heuristics for the estimation of two important biochemical crop traits

  • In case of Cxc, optimal results were obtained with uncertainty PAL method, reducing the root mean square error (RMSE) from > 6 to 1.33 μg/cm2 when trained on 156 samples

  • Second best results were here obtained by entropy query-by-bagging (EQB) method with relative RMSE (rRMSE) of 23%

Read more

Summary

Introduction

The models are usually coupled to simulate canopy bidirectional reflectance from 400 to 2500 nm as a function of several biochemicals, such as pigment, protein and water contents, and biophysical input parameters, such as LAI, average leaf inclination angle, spectral soil background, as well as observation and viewing geometries [18,19] This modelling scheme, here called PROSAIL-PRO, can be used to establish training databases composed of vegetation properties (=RTM input) and simulated spectral signals (=RTM output), known as look-up-tables (LUT). If the sample is increased in only small increments, the computational demand for convergence testing may be too high [39] Among these three categories, AL is an auspicious technique recently applied within many machine learning problems where labeling of data is difficult, time-consuming or expensive [36]. In the context of EO analysis and modelling, AL has mainly been used in three applications: Classification, e.g., [34]; Emulation, e.g., [40]; Regression, e.g., [41]

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call