Abstract

Exposure to geogenic contaminated groundwaters (GCGs) is a significant public health concern. Machine learning models are powerful tools for the discovery of potential GCGs. However, the insufficient groundwater quality data and the fact that GCGs are typically a minority class in data hinder models to produce meaningful GCG predictions. To address this issue, a deep learning method, Siamese network-based transfer learning (SNTL), is used to estimate the probability that hazardous substances are present in groundwater above a threshold based on limited and class-imbalanced data. SNTL greatly reduces the amount of required training data and eliminates negative effects of class-imbalanced data on prediction model performance. The predictions of three typical GCGs (high arsenic/fluoride/iodine groundwater) show that the SNTL models provide higher (about 80%) and more balanced sensitivity and specificity than benchmark Random Forest models, indicating that SNTL models can predict both GCGs and non-GCGs. Therefore, protecting populations from GCG exposure in areas where other prediction methods fail to contribute risk information due to poor groundwater quality data can be enabled by SNTL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call