Abstract

This study provided a comprehensive evaluation of eight machine learning regression algorithms for forest aboveground biomass (AGB) estimation from satellite data based on leaf area index, canopy height, net primary production, and tree cover data, as well as climatic and topographical data. Some of these algorithms have not been commonly used for forest AGB estimation such as the extremely randomized trees, stochastic gradient boosting, and categorical boosting (CatBoost) regression. For each algorithm, its hyperparameters were optimized using grid search with cross-validation, and the optimal AGB model was developed using the training dataset (80%) and AGB was predicted on the test dataset (20%). Performance metrics, feature importance as well as overestimation and underestimation were considered as indicators for evaluating the performance of an algorithm. To reduce the impacts of the random training-test data split and sampling method on the performance, the above procedures were repeated 50 times for each algorithm under the random sampling, the stratified sampling, and separate modeling scenarios. The results showed that five tree-based ensemble algorithms performed better than the three nonensemble algorithms (multivariate adaptive regression splines, support vector regression, and multilayer perceptron), and the CatBoost algorithm outperformed the other algorithms for AGB estimation. Compared with the random sampling scenario, the stratified sampling scenario and separate modeling did not significantly improve the AGB estimates, but modeling AGB for each forest type separately provided stable results in terms of the contributions of the predictor variables to the AGB estimates. All the algorithms showed forest AGB were underestimated when the AGB values were larger than 210 Mg/ha and overestimated when the AGB values were less than 120 Mg/ha. This study highlighted the capability of ensemble algorithms to improve AGB estimates and the necessity of improving AGB estimates for high and low AGB levels in future studies.

Highlights

  • Forest biomass is an essential climate variable that measures the net carbon dioxide exchange between the land surface and the atmosphere [1]

  • Averaging the performance metrics of 50 runs for each regression algorithm, we found that the CatBoost algorithm had the overall best performance in estimating forest aboveground biomass (AGB) from multiple satellite data products, with a mean R-squared of 0.71, root mean square error (RMSE) of 46.67, and relative RMSE of 26% (Table 3)

  • The results showed that forest AGB estimated with the tree-based ensemble algorithms, including the random forests (RFs), extremely randomized trees (ERT), gradient-boosted regression tree (GBRT), stochastic gradient boosting (SGB), and CatBoost algorithms, had the mean R2 for 50 runs ranging from 0.69 to 0.71, RMSE ranging from 46.67 to 47.95 Mg/ha, bias ranging from −0.21 to 0.10 Mg/ha, and relative RMSE ranging from 26.00 to 26.72%, and were more accurate than those estimated with the multivariate adaptive regression splines (MARS), support vector regression (SVR), and multilayer perceptron (MLP) algorithms with the mean R2 ranging from 0.56 to 0.66, RMSE ranging from 50.34 M to 56.69 Mg/ha, bias ranging from −0.02 to 1.55 Mg/ha, and relative RMSE ranging from 28.05 to 31.58%

Read more

Summary

Introduction

Forest biomass is an essential climate variable that measures the net carbon dioxide exchange between the land surface and the atmosphere [1]. There is widespread consensus among studies that forest AGB can be best estimated from a combination of field measurements and remotely sensed datasets. Based on both types of data, many forest AGB maps were produced at local, regional, or global scales using various algorithms [2,3,4,5,6,7,8]. To improve the accuracy of AGB estimation, some recent studies have proposed that efforts should be made to compile field biomass extensively, integrate multiple remote sensing datasets, explore novel approaches, and comprehensively address the uncertainty associated with biomass estimates [11,14,15,16]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call