Machine learning models show promise in identifying geogenic contaminated groundwaters. Modeling in regions with no or limited samples is challenging due to the need for large training sets. One potential solution is transferring existing models to such regions. This study explores the transferability of high fluoride groundwater models between basins in the Shanxi Rift System, considering six factors, including modeling methods, predictor types, data size, sample/predictor ratio (SPR), predictor range, and data informing. Results show that transferability is achieved only when model predictors are based on hydrochemical parameters rather than surface parameters. Data informing, i.e., adding samples from challenging regions to the training set, further enhances the transferability. Stepwise regression shows that hydrochemical predictors and data informing significantly improve transferability, while data size, SPR, and predictor range have no significant effects. Additionally, despite their stronger nonlinear capabilities, random forests and artificial neural networks do not necessarily surpass logistic regression in transferability. Lastly, we utilize the t-SNE algorithm to generate low-dimensional representations of data from different basins and compare these representations to elucidate the critical role of predictor types in transferability.
Read full abstract