Groundwater resources in the Kingdom of Saudi Arabia (KSA) have high levels of natural radioactivity. Within the northwestern KSA, gross alpha (α) and gross beta (β) levels exceed national and international drinking-water limits. In this study, we developed and used an automated machine learning (AML) approach to quantify relationships between gross α and gross β activities and different geological, hydrogeological, and geochemical conditions. Two AML model groups (group I for gross α; group II for gross β) were constructed, using water samples collected from 360 irrigation and water supply wells, to define a robust model that explains the spatial variability in gross α and gross β activities, as well as variables that control the gross activities. Each group contained four model families: deep neural network (DNN), gradient boosting machine (GBM), generalized linear model (GLM), and distributed random forest (DRF). Model inputs include chemical compositions as well as geological and hydrogeological conditions. Three performance metrics were used to evaluate the models during training and testing: normalized root mean square error (NRMSE), Pearson's correlation coefficient (r), and Nash-Sutcliff efficiency (NSE) coefficient. Results indicate that (1) the GBM model outperformed (training: NRMSE: 0.37 ± 0.10; r: 0.92 ± 0.05; NSE: 0.85 ± 0.09; testing: NRMSE: 0.71 ± 0.08; r: 0.72 ± 0.08; NSE: 0.49 ± 0.12) the DNN, DRF, and GLM models when modelling gross α activities; (2) gross α activities are controlled by pH, stream density, nitrate, manganese, and vegetation index; (3) the DRF model outperformed (training: NRMSE: 0.41 ± 0.05; r: 0.92 ± 0.02; NSE: 0.83 ± 0.04; testing: NRMSE: 0.67 ± 0.09; r: 0.77 ± 0.07; NSE: 0.54 ± 0.12) the GBM, DNN, and GLM models when modelling gross β activities; (4) input variables that affect the gross β actives are pH, temperature, stream density, lithology, and nitrate; and (5) no single model could be used to model both gross α and gross β activities—instead, a combination of AML models should be used. Our computationally efficient approach provides a framework and insights for using AML techniques in water quality investigations and promotes more and improved use of different geological, hydrogeological, and geochemical datasets by the scientific community and decision makers to develop guidelines for mitigation.
Read full abstract