Abstract

Machine learning (ML) models have been widely used for predicting spatial variability of soil heavy metals. However, it is impossible to explore the entire hyperparameter space of ML models by artificially trial-and-error experimentation. Here, an auto hyperparameter optimization-based machine learning (HPO-ML) method with three search algorithms and random forest (RF) and extreme gradient boosting (XGBoost) models was developed to predict the heavy metal content in soil with multiple environmental variables. The tree-structured Parzen estimator (TPE) algorithm outperformed other search algorithms in identifying the optimal hyperparameters of RF and XGBoost models. The model prediction results showed that the TPE-XGBoost had the highest accuracy for predicting the As (RMSE = 3.06 mg kg−1 and R2 = 70.35%), Cd (RMSE = 0.10 mg kg−1 and R2 = 75.43%), Cr (RMSE = 13.86 mg kg−1 and R2 = 82.11%), Ni (RMSE = 3.19 mg kg−1 and R2 = 75.20%), Pb (RMSE = 3.75 mg kg−1 and R2 = 74.79%), and Zn (RMSE = 6.83 mg kg−1 and R2 = 70.05%) contents. The TPE-XGBoost mapping result showed that areas with high concentrations of soil heavy metals were concentrated in the central and eastern areas (As), the mainstream of the Yellow River (Cd), the northeast area (Cr), the ancient watercourse of the Yellow River (Ni and Pb), and the central and northeastern areas (Zn). The SHapley additive explanation (SHAP) and structural equation model (SEM) were used to interpret the drivers of environmental variables. It is found that the variables with the highest contributions were CO, PM2.5, O3, PC3, PC1, and PC4 for predicting the As, Cd, Cr, Ni, Pb, and Zn contents, respectively, and there was a significant source-receptor coupling path. The results demonstrate the feasibility of using the HPO-ML approach in hyperparameter-limited conditions, which providing data-driven pathways and options to support the high-quality development of agriculture and the protection of farmland soil ecosystem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call