Abstract

Exposure to contaminated water during aquatic recreational activities can lead to gastrointestinal diseases. In order to decrease the exposure risk, the fecal indicator bacteria Escherichia coli is routinely monitored, which is time-consuming, labor-intensive, and costly. To assist the stakeholders in the daily management of bathing sites, models have been developed to predict the microbiological quality. However, model performances are highly dependent on the quality of the input data which are usually scarce. In our study, we proposed a conceptual framework for optimizing the selection of the most adapted model, and to enrich the training dataset. This frameword was successfully applied to the prediction of Escherichia coli concentrations in the Marne River (Paris Area, France). We compared the performance of six machine learning (ML)-based models: K-nearest neighbors, Decision Tree, Support Vector Machines, Bagging, Random Forest, and Adaptive boosting. Based on several statistical metrics, the Random Forest model presented the best accuracy compared to the other models. However, 53.2 ± 3.5% of the predicted E. coli densities were inaccurately estimated according to the mean absolute percentage error (MAPE). Four parameters (temperature, conductivity, 24 h cumulative rainfall of the previous day the sampling, and the river flow) were identified as key variables to be monitored for optimization of the ML model. The set of values to be optimized will feed an alert system for monitoring the microbiological quality of the water through combined strategy of in situ manual sampling and the deployment of a network of sensors. Based on these results, we propose a guideline for ML model selection and sampling optimization.

Highlights

  • The regulatory monitoring of the bathing waters is based on the enumeration of culturable fecal indicator bacteria, Escherichia coli and intestinal enterococci (e.g., European Bathing directive 2006/7/EC)

  • We propose to compare the performance of six machine-learning models, including three traditional models and three ensemblist models, to predict the concentrations of the fecal indicator bacteria Escherichia coli

  • Concerning the water quality parameters that we have investigated in this work, there are a myriad of sensors that could perform their collection with acceptable data quality

Read more

Summary

Introduction

Worldwide the heat wave episodes have recently intensified the development of aquatic recreational activities in megapoles, increasing the interactions between citizens and freshwater in urban context [1]. Many cities, such as Paris, London, or Berlin, promote the opening of bathing areas and organize open water swimming competitions in their rivers. The regulatory monitoring of the bathing waters is based on the enumeration of culturable fecal indicator bacteria, Escherichia coli and intestinal enterococci (e.g., European Bathing directive 2006/7/EC) Such surveys are costly, time-consuming, and labor-intensive, as a consequence weekly or monthly sampling strategies are routinely implemented with additional event-based sampling [11,12]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call