The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps, time series data). Also, imbalanced learning and feature selection techniques were investigated as supplementary methods, adding up to four scenarios. Results showed that the RF model, integrated with forward sequential feature selection (SFS) and SMOTE-Tomek resampling method, outperformed the other models (F1-score: 0.94, MCC: 0.83). The SFS techniques outperformed other feature selection methods in enhancing the accuracy of the models with the cost of computational expenses, and the cost-sensitive function proved more efficient in tackling imbalanced data issues than the other investigated methods. The DBEST method identified significant breakpoints within each time series dataset, revealing a clear association between agricultural practices along the Zayandehrood River and substantial nitrate contamination within the Lenjanat region. Additionally, the groundwater vulnerability maps created using the candid RF model and an ensemble of the best RF, SVM, and KNN models predicted mid to high levels of vulnerability in the central parts and the downhills in the southwest.
Read full abstract