Abstract
Water is a valuable, necessary and unfortunately rare commodity in both developing and developed countries all over the world. It is undoubtedly the most important natural resource on the planet and constitutes an essential nutrient for human health. Geo-environmental pollution can be caused by many different types of waste, such as municipal solid, industrial, agricultural (e.g., pesticides and fertilisers), medical, etc., making the water unsuitable for use by any living being. Therefore, finding efficient methods to automate checking of water suitability is of great importance. In the context of this research work, we leveraged a supervised learning approach in order to design as accurate as possible predictive models from a labelled training dataset for the identification of water suitability, either for consumption or other uses. We assume a set of physiochemical and microbiological parameters as input features that help represent the water’s status and determine its suitability class (namely safe or nonsafe). From a methodological perspective, the problem is treated as a binary classification task, and the machine learning models’ performance (such as Naive Bayes–NB, Logistic Regression–LR, k Nearest Neighbours–kNN, tree-based classifiers and ensemble techniques) is evaluated with and without the application of class balancing (i.e., use or nonuse of Synthetic Minority Oversampling Technique–SMOTE), comparing them in terms of Accuracy, Recall, Precision and Area Under the Curve (AUC). In our demonstration, results show that the Stacking classification model after SMOTE with 10-fold cross-validation outperforms the others with an Accuracy and Recall of 98.1%, Precision of 100% and an AUC equal to 99.9%. In conclusion, in this article, a framework is presented that can support the researchers’ efforts toward water quality prediction using machine learning (ML).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.