To explore the impact of environmental factors on soil organic carbon (SOC) with machine learning (ML) model is of great significance for mitigating climate change and soil carbon sequestration and emission reduction. However, the traditional ML model is limited by the hyperparameter adjustment of artificially trial-and-error experimentation and the inexplicability of fitting process, and the precision and performance of ML model cannot be fully utilized. For the end, this study developed a tree-structured Parzen estimator-extreme gradient boosting (TPE-XGBoost) method based on SHapley additive explanations (SHAP) analysis to analyze the response of climate, human activities, soil properties and terrain for SOC (0-200cm) in different land use types of China. The results of descriptive statistics described the order of SOC content: forest land > grassland > cultivated land > unused land. With the increase of soil depth, the SOC content of all land types decreased continuously, and the values indicate a left-skewed non-normal distribution. The fitting accuracy (R2) of TPE-XGBoost model for SOC content was greater than 0.8. At the depth of 0-5cm, the prediction accuracy of cultivated land (R2 = 0.96), grassland (R2 = 0.93), forest land (R2 = 0.95) and unused land (R2 = 0.95) was the highest. The result of SHAP analysis showed that the factors that contributed the most to the fitting accuracy of cultivated land, grassland, forest land and unused land in all depths were temperature, soil pH, temperature and elevation. From surface to deep soil, the mean SHAP value showed a downward trend, indicating that the driving force of environmental factors on the content of SOC gradually weakened. The individual explanations of the variance partitioning (VP) analysis of climate, terrain, and soil property for cultivated land (0-200cm), forest land (30-60cm), and unused land (0-200cm) was as high as 0.32, 0.17, and 0.16, respectively, which indicated that these environmental factors had a high response to SOC content. It is found that the appropriate temperature not only promotes plant roots to obtain nutrients, but also interacts with soil pH on microorganisms, thereby increasing the SOC content. The results confirm that the TPE-XGBoost model based on SHAP analysis can reliably explain the nonlinear driving effect of environmental factors on the SOC, which provides credible decision support for accounting carbon budget and carbon sequestration in large-scale regions.
Read full abstract