The majority of state-of-the-art research employs remote sensing on AGB (Above Ground Biomass) and SOC (Soil Organic Carbon) separately, although some studies indicate a positive correlation between the two. We intend to combine the two domains in our research to improve state-of-the-art total carbon estimation. We begin by establishing a baseline model in our study area in Scotland, using state-of-the-art methodologies in the SOC and AGB domains. The effects of feature engineering techniques such as variance inflation factor and feature selection on machine learning models are then investigated. This is extended by combining predictor variables from the two domains. Finally, we leverage the possible correlation between AGB and SOC to establish a relationship between the two and propose novel models in an attempt to outperform the state-of-the-art results. We compared three machine learning techniques, boosted regression tree, random forest, and xgboost. These techniques have been demonstrated to be the most effective in both domains. This research makes three contributions: (i) Including Digital Elevation Map (DEM) as a predictor variable in the AGB model improves the model result by 13.5 % on average across the three machine learning techniques experimented, implying that DEM should be considered for AGB estimation as well, despite the fact that it has previously been used exclusively for SOC estimation. (ii) Using SOC and SOC Density improves the prediction of the AGB model by a significant 14.2% on average compared to the state-of-the-art baseline (When comparing the R2 value across all three modeling techniques in Model B and Model H, there is an increase from 0.5016 to 0.5604 for BRT, 0.4958 to 0.5925 for RF and 0.5161 to 0.5750 for XGB), which strengthens our experiment results and suggests a future research direction of combining AGB and SOC as a joint study domain. (iii) Including AGB as a predictor variable for SOC improves model performance for Random Forest, but reduced performance for Boosted Regression tree and XG Boost, indicating that the results are specific to ML models and more research is required on the feature space and modeling techniques. Additionally, we propose a method for estimating total carbon using data from Sentinel 1, Sentinel 2, Landsat 8, Digital Elevation, and the Forest Inventory.
Read full abstract