Predicting the presence of E. coli in tap water using machine learning in Nepal

So Kuroki,Maiko Sakamoto,Ryuji Ogata

doi:10.1111/wej.12844

Abstract

AbstractWithin developing countries, a multitude of problems that affect the water supply process can result in the contamination of water taps. While machine learning applications have become popular for attaining efficient water quality predictions, acquiring the necessary data for modelling for developing countries is challenging. This study constructs water quality prediction models by machine learning with a pseudo‐pipeline network to complement the missing data of the water supply process. Using both water source and water tap quality information measured by the Government of Nepal, we apply the three machine learning models: support vector machine (SVM), random forest (RF) and LightGBM. Furthermore, we also apply a traditional statistical method—logistic regression (LR)—to the prediction of the Escherichia coli (E. coli) contamination in water taps. With some input variables (such as the length from the nearest sources) obtained from the pseudo‐pipeline network, the results show that SVM has stable and high accuracy for both the 26 cities (70%) and for the 25 cities except for Kathmandu (79%). LR performed a significantly lower accuracy for all cities (61%) than for 25 cities (79%). Additionally, we show that our method can be applied to other regions where a water quality survey has not yet been conducted.

Full Text