Abstract

<p>Heavy metal contamination in soil is a major environmental issue intensified by rapid industrial and population growth. Understanding the spatial distribution of soil contamination by heavy metals in the ecosystem is a necessary precondition to monitor soil health and to assess the ecological risks. The main sources of heavy metals in soil are natural and anthropogenic sources. Natural sources are typically released of heavy metals from rock by weathering and atmospheric precipitation. Anthropogenic sources are related to industrialization, rapid urbanization, agricultural practices, and military activities. We analyzed a total of 358 topsoil samples (0–30 cm) collected in Golestan province in the northeast of Iran based on a regular square grid networks with 1,700 squares each sized 2.5 km²(random sampling within the grid). From these samples, we determined the spatial distribution of Cd, Cu, Ni, Zn, and Pb using random forest (RF). A multi-spectral image (Landsat 8), and environmental derivatives calculated from terrain attributes, climatic parameters, parent material, land use maps, distances to mine sectors, main roads, industrial sites, and rivers were used as covariates to predict the spatial distribution of concentrations of heavy metals. The multi-collinearity of the predictors was examined by the variance inflation factor (VIF), and a feature selection process (genetic algorithm) was applied to avoid noise and optimize the selected input variables for the final model. The predictive accuracy of RF model was assessed by the mean prediction error (ME), root mean squared error (RMSE), and coefficient of determination (R<sup>2</sup>) using 5-fold cross-validation technique. The results showed that the concentration levels (mg kg<sup>-1</sup>) of Cd, Cu, Pb, Ni, and Zn varied from 0.02 to 2.75, 9.70 to 93.70, 6.80 to 114.20, 9.50 to 93.20, and 25.10 to 417.4, respectively. The best prediction performance was for Ni (RMSE=9.9 mg kg<sup>-1 </sup>and R<sup>2</sup>=56.6%), and the lowest prediction performance for Cd (RMSE=0.4 mg kg<sup>-1 </sup>and R<sup>2</sup>=28.0%). Environmental covariates that control soil moisture and water flow along with climatic factors were the most important variables to define the spatial distribution of soil heavy metals. We conclude that the RF model using easily accessible environmental covariates is a promising, cost-effective and fast approach to monitor the spatial distribution of heavy metal contamination in soils.</p><p><strong>Keywords:</strong> Heavy metals; digital soil mapping; machine learning; random forest; spatial variation; soil pollution.</p>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call