Due to the fast advancement of big data, applying Machine Learning (ML) techniques to detect Soil Pollution (SP) at Potentially Contaminated Sites (PCS) across many sectors and regional sizes has emerged as a prominent research focus. The challenges in acquiring essential indices of SP sources and routes result in present methodologies exhibiting low predictive accuracy and an inadequate scientific foundation. This study gathered environmental data concerning heavy metal and organic contamination from 200 PCS across six representative sectors. Twenty-one indices derived from fundamental data, potential SP from products and materials, SP efficacy, and the migrating capability of SP were employed to build the SP detection index method. The research integrated the score into the new characteristic group, including 11 indicators using consolidation computation. The newly selected feature subset was utilized for training ML designs, including Random Forests (RF), Support Vector Machines (SVM), and Multilayer Perceptrons (MLP), and evaluated to ascertain its impact on SP recognition methods. The study findings indicated that the four newly developed indices by feature fusion exhibit an association with SP comparable to that of the original index. The component analysis suggests that several indices related to fundamental information, contamination potential from products and raw materials, and SP prevention levels significantly influence SP to varying extents. The index of the migratory capability of soil contaminants has minimal influence on the classification job of SP detection inside PCS. This research introduces a novel technological approach for identifying SP via big data and ML techniques while offering an overview and scientific foundation for PCS's environmental administration and SP mitigation.
Read full abstract