Siamese Network-Based Transfer Learning Model to Predict Geogenic Contaminated Groundwaters.

Hailong Cao,Guibin Jiang,Yanxin Wang,Jianbo Shi,Xianjun Xie

doi:10.1021/acs.est.1c08682

Abstract

Exposure to geogenic contaminated groundwaters (GCGs) is a significant public health concern. Machine learning models are powerful tools for the discovery of potential GCGs. However, the insufficient groundwater quality data and the fact that GCGs are typically a minority class in data hinder models to produce meaningful GCG predictions. To address this issue, a deep learning method, Siamese network-based transfer learning (SNTL), is used to estimate the probability that hazardous substances are present in groundwater above a threshold based on limited and class-imbalanced data. SNTL greatly reduces the amount of required training data and eliminates negative effects of class-imbalanced data on prediction model performance. The predictions of three typical GCGs (high arsenic/fluoride/iodine groundwater) show that the SNTL models provide higher (about 80%) and more balanced sensitivity and specificity than benchmark Random Forest models, indicating that SNTL models can predict both GCGs and non-GCGs. Therefore, protecting populations from GCG exposure in areas where other prediction methods fail to contribute risk information due to poor groundwater quality data can be enabled by SNTL.

Full Text