Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health challenge in the study area. Considering the agricultural development and increasing groundwater use in the area, our findings can guide local governments in managing the extent of groundwater development, establishing control zones, and enhancing protection measures for populations at risk from groundwater contamination.
Read full abstract