Abstract
Understanding influential factors for fecal contamination in groundwater is critical for ensuring water safety and public health. The objective of this study is to identify key factors for fecal contamination of shallow tubewells using machine learning methods. Three methods, including recursive feature elimination (RFE) with XGBoost, Random Forest, and mutual information, were implemented to examine E. coli presence and concentration in 1495 tubewell water samples in Matlab, Bangladesh. For E. coli presence, climatic variables, including average rainfall and temperature over the 30, 15, and 7 days preceding sampling, as well as ambient temperature and rainfall on the sampling day, emerged as critical predictors. Land cover characteristics, such as the percentages of urban and agricultural areas within 100 m of a tubewell, were also significant. For E. coli concentration, land cover characteristics within 100 m, the number of hot and heavy-rain days in the 30 days preceding sampling, average rainfall and temperature in the 3 days preceding sampling, and ambient temperature on the sampling day were identified as key drivers. Random Forest and mutual information yielded results that were more similar to each other than to those of RFE with XGBoost. The findings highlight the interplay between climatic factors, land use, and population density in determining fecal contamination in shallow well water and demonstrate the power of machine learning algorithms in ranking these factors.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have