Abstract

Understanding the spatial distribution of soil organic carbon (SOC) content over different climatic regions will enhance our knowledge of carbon gains and losses due to climatic change. However, little is known about the SOC content in the contrasting arid and sub-humid regions of Iran, whose complex SOC–landscape relationships pose a challenge to spatial analysis. Machine learning (ML) models with a digital soil mapping framework can solve such complex relationships. Current research focusses on ensemble ML models to increase the accuracy of prediction. The usual ensemble method is boosting or weighted averaging. This study proposes a novel ensemble technique: the stacking of multiple ML models through a meta-learning model. In addition, we tested the ensemble through rescanning the covariate space to maximize the prediction accuracy. We first applied six state-of-the-art ML models (i.e., Cubist, random forests (RF), extreme gradient boosting (XGBoost), classical artificial neural network models (ANN), neural network ensemble based on model averaging (AvNNet), and deep learning neural networks (DNN)) to predict and map the spatial distribution of SOC content at six soil depth intervals for both regions. In addition, the stacking of multiple ML models through a meta-learning model with/without rescanning the covariate space were tested and applied to maximize the prediction accuracy. Out of six ML models, the DNN resulted in the best modeling accuracies, followed by RF, XGBoost, AvNNet, ANN, and Cubist. Importantly, the stacking of models indicated a significant improvement in the prediction of SOC content, especially when combined with rescanning the covariate space. For instance, the RMSE values for SOC content prediction of the upper 0–5 cm of the soil profiles of the arid site and the sub-humid site by the proposed stacking approaches were 17% and 9% respectively, less than that obtained by the DNN models—the best individual model. This indicates that rescanning the original covariate space by a meta-learning model can extract more information and improve the SOC content prediction accuracy. Overall, our results suggest that the stacking of diverse sets of models could be used to more accurately estimate the spatial distribution of SOC content in different climatic regions.

Highlights

  • Soil organic carbon (SOC) storage is a key function of soils, influencing soil physicochemical properties [1,2], e.g., soil water storage capacity, nutrient holding capacity, and infiltration rate

  • We used the conditioned Latin Hypercube Sampling, which provides an optimal stratification of the covariate space [45,46], to select representative sample locations based on the covariates [47,48,49,50,51,52]

  • We estimated the soil organic carbon (SOC) at six depth intervals of 0–5, 5–15, 15–30, 30–60, 60–100, and 100–200 cm, in accordance with the standard depths specified by the GlobalSoilMap project [55]

Read more

Summary

Introduction

Soil organic carbon (SOC) storage is a key function of soils, influencing soil physicochemical properties [1,2], e.g., soil water storage capacity, nutrient holding capacity, and infiltration rate. Accurate information on the spatial distribution of SOC is vital to estimate and predict greenhouse gas emissions and physicochemical functions of soils [4,5] Such information is most important in arid and semi-arid areas where soils tend to have low levels of organic carbon [6,7] compared to the humid region. Unlike bagging, boosting, and averaging methods, stacking ensemble modeling is rarely explored in digital soil mapping. Tajik et al [40], Zhou et al [41], and Chen et al [42] recently evaluated the efficacy of the ensemble models—by averaging the model predictions—to predict the spatial variation of soil properties in Iran, China, and France, respectively. To the best of our knowledge, there is no study to conduct digital mapping of SOC content using stacking approaches in different climatic conditions. Predominant soils of the study area [43] are Kastanozems with ~70%, Cambisols with ~25%, and Chernozems with ~5% of the area

Data Collection and Soil Sample Analysis
Covariates Used for the Development of ML Models
Covariate Selection
The Individual ML Models in Level 0
Meta-Learning Models in Level 1
Optimizing the Hyper-Parameters of Machine Learning Models
Performances of the Individual ML Models
Performances of the Stacking Ensemble Models
Findings
Performances of ML Models in Two Different Climatic Regions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call