Abstract
Mapping aboveground forest biomass is central for assessing the global carbon balance. However, current large-scale maps show strong disparities, despite good validation statistics of their underlying models. Here, we attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power. To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. A standard nonspatial validation method suggests that the model predicts more than half of the forest biomass variation, while spatial validation methods accounting for SAC reveal quasi-null predictive power. This study underscores how a common practice in big data mapping studies shows an apparent high predictive power, even when predictors have poor relationships with the ecological variable of interest, thus possibly leading to erroneous maps and interpretations.
Highlights
Mapping aboveground forest biomass is central for assessing the global carbon balance
We tested the same random forest (RF) model with a spatial K-fold CV approach to assess the influence of spatial autocorrelation in the data on the statistics of the model predictive power
Reference aboveground biomass (AGB) pixels were split into 44 homogeneous spatial clusters (Fig. 3a) based on a maximum distance threshold of 150 km for observations within clusters, and clusters were alternatively used as training and test sets
Summary
Mapping aboveground forest biomass is central for assessing the global carbon balance. We attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. Two reference pantropical carbondensity maps have been produced using a combination of environmental and RS predictors[2,3] These maps have been used in high-ranking studies to estimate greenhouse gas emissions[5], to assess the relationships between forest carbon and biodiversity[6], climate[7], and land management[8,9,10], or even to evaluate the sensitivity of new space-borne sensors to aboveground biomass (AGB)[4,11,12]. Environmental conditions were characterized using climate variables from the WorldClim-2 database[27] and topographic variables derived from SRTM data
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have