Groundwater resources in Bitlis province and its surroundings in Türkiye’s Eastern Anatolia Region are pivotal for drinking water, yet they face a significant threat from fluoride contamination, compounded by the region’s volcanic rock structure. To address this concern, fluoride levels were meticulously measured at 30 points in June 2019 dry period and September 2019 rainy period. Despite the accuracy of present measurement techniques, their time-consuming nature renders them economically unviable. Therefore, this study aims to assess the distribution of probable geogenic contamination of groundwater and develop a robust prediction model by analyzing the relationship between predictive variables and target contaminants. In this pursuit, various machine learning techniques and regression models, including Linear Regression, Random Forest, Decision Tree, K-Neighbors, and XGBoost, as well as deep learning models such as ANN, DNN, CNN, and LSTM, were employed. Elements such as aluminum (Al), boron (B), cadmium (Cd), cobalt (Co), chromium (Cr), copper (Cu), iron (Fe), manganese (Mn), nickel (Ni), phosphorus (Pb), lead (Pb), and zinc (Zn) were utilized as features to predict fluoride levels. The SelectKbest feature selection method was used to improve the accuracy of the prediction model. This method identifies important features in the dataset for different values of k and increases model efficiency. The models were able to produce more accurate predictions by selecting the most important variables. The findings highlight the superior performance of the XGBoost regressor and CNN in predicting groundwater quality, with XGBoost consistently outperforming other models, exhibiting the lowest values for evaluation metrics like mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) across different k values. For instance, when considering all features, XGBoost attained an MSE of 0.07, an MAE of 0.22, an RMSE of 0.27, a MAPE of 9.25%, and an NSE of 0.75. Conversely, the Decision Tree regressor consistently displayed inferior performance, with its maximum MSE reaching 0.11 (k = 5) and maximum RMSE of 0.33 (k = 5). Furthermore, feature selection analysis revealed the consistent significance of boron (B) and cadmium (Cd) across all datasets, underscoring their pivotal roles in groundwater contamination. Notably, in the machine learning framework evaluation, the XGBoost regressor excelled in modeling both the “all” and “rainy season” datasets, while the convolutional neural network (CNN) outperformed in the “dry season” dataset. This study emphasizes the potential of XGBoost regressor and CNN for accurate groundwater quality prediction and recommends their utilization, while acknowledging the limitations of the Decision Tree Regressor.
Read full abstract