Investigating the bioaccessibility of harmful inorganic elements in soil is crucial for understanding their behavior in the environment and accurately assessing the environmental risks associated with soil. Traditional batch experimental methods and linear models, however, are time-consuming and often fall short in precisely quantifying bioaccessibility. In this study, using 937 data points gathered from 56 journal articles, we developed machine learning models for three harmful inorganic elements, namely, Cd, Pb, and As. After thorough analysis, the model optimized through a boosting ensemble strategy demonstrated the best performance, with an average R 2 of 0.95 and an RMSE of 0.25. We further employed SHAP values in conjunction with quantitative analysis to identify the key features that influence bioaccessibility. By utilizing the developed integrated models, we carried out predictions for 3002 data points across China, clarifying the bioaccessibility of cadmium (Cd), lead (Pb), and arsenic (As) in the soils of various sites and constructed a comprehensive spatial distribution map of China using the inverse distance weighting (IDW) interpolation method. Based on these findings, we further derived the soil environmental standards for metallurgical sites in China. Our observations from the collected data indicate a reduction in the number of sites exceeding the standard levels for Cd, Pb, and As in mining/smelting sites from 5, 58, and 14 to 1, 24, and 7, respectively. This research offers a precise and scientific approach for cross-regional risk assessment at the continental scale and lays a solid foundation for soil environmental management.
Read full abstract