Identifying and predicting the nitrate inflow and distribution characteristics of groundwater is critical for groundwater contamination control and management in rural mixed-land-use areas. Several groundwater nitrate prediction models have been developed; in particular, a nitrate concentration model that uses dissolved ions in groundwater as an input variable can produce accurate results. However, obtaining sufficient chemical data from a target area remains challenging. We tested whether machine learning models can effectively determine nitrate contamination using field-measured data (pH, electrical conductivity, water temperature, dissolved oxygen, and redox potential) and existing geographic information system (GIS) data (lithology, land cover, and hydrogeological properties) from the Nonsan Stream Watershed in South Korea, an area where nitrate contamination occurs owing to intensive agricultural activities. In total, 183 groundwater samples from different wells, mixed municipal sites, and agricultural activities were used. The results indicated that among the four machine learning models (artificial neural network (ANN), classification and regression tree (CART), random forest (RF), and support vector machine (SVM)), the RF (R2: 0.74; RMSE: 3.5) and SVM (R2: 0.80; RMSE: 2.8) achieved the highest prediction accuracy and smallest error in all groundwater parameter estimates. Land cover, aquifer type, and soil drainage were the primary RF and SVM model input variables, representing agricultural activity-related and hydrogeological infiltration effects. Our research found that in rural areas with limited hydro-chemical data, RF and SVM models could be used to identify areas at high risk of nitrate contamination using spatial variability, GIS-aided visualization, and easily accessible field-measured groundwater quality data.
Read full abstract