Henan Province's plain area is the granary of China, yet its regional aquifer is being polluted by industrial wastewater, agricultural pesticide, fertilizer and domestic wastewater. In order to safeguard the security of food and drinking water, and in response to the problem of low prediction accuracy caused by the lack of samples and unevenly distributed groundwater monitoring data, we propose a new way to predict the aquifer vulnerability in large areas by rich small-scale data, so as to identify the pollution risks and to address the issue of sample shortage. In small regions with abundant nitrate data, we employed a Random Forest model to screen key impact indicators, using them as features and nitrate-N concentration as the target variable. Consequently, we established six machine learning prediction models, and then selected the best bagging model (R2 = 0.86) to predict the vulnerability of aquifers in larger regions lacking nitrate data. The predicted results showed that highly vulnerable areas accounted for 20 %, which were mainly affected by aquifer thickness (65.91 %). High nitrate-N concentration implies serious aquifer contamination. Therefore, a long series of groundwater nitrate-N concentration monitoring data in a large scale, the trend and slope of nitrate-N concentration showed a significant correlation with the model prediction results (Spearman's correlation coefficients are 0.75 and 0.58). This study can help identify the risk of aquifer contamination, solve the problem of sample shortage in large areas, thus contributing to the security of food and drinking water.
Read full abstract