Abstract

Background: The quality of water directly or indirectly impacts the health and environmental well-being. Data about water quality can be evaluated using a Water Quality Index (WQI). Computing WQI is a quick and affordable technique to accurately summarise the quality of water. background: Find strategies for data preparation to categorize a dataset on the water quality in two remote Indian villages Objective: The objective of this study is to find strategies for data preparation to categorize a dataset on the water quality in two remote Indian villages in different geographic locations, to predict the quality of water, and to identify low-quality water before it is made accessible for human consumption. Methods: To accomplish this task, four water quality features Nitrate, pH, Residual Chlorine, and Total Dissolved Solids which are crucial for human consumption, are considered to dictate the quality of water. Methods used in handling these features include five steps that are data preprocessing with min-max normalization, finding WQI, using feature correlation to identify parameter importance with WQI, application of supervised machine learning regression models such as Random Forest (RF), Multiple Linear Regression (MLR), Gradient Boosting (GB) and Support Vector Machine (SVM) for WQI prediction. Then, a variety of machine learning classification models, including K-Nearest Neighbour (KNN), Support Vector Classifier (SVC), and Multi-layer Perceptron (MLP), are ensembled with Logistic Regression (LR), acting as a meta learner, to create a stack ensemble model classifier to predict the Water Quality Class (WQC) more accurately. Results: The examination of the testing model revealed that RF regression and MLR algorithms performed best in predicting the WQI with mean absolute error (MAE) of 0.003 and 0.001 respectively. Mean square error (MSE), root mean square error (RMSE), R squared (R2), and Explained Variance Score (EVS) findings are 0.002,0.005,0.988 and 0.998 respectively with RF while 0.001,0.031,0.999 and 0.999 respectively with MLR. Meanwhile, for predicting WQC, the stack model classifier showed the best performance with an Accuracy of 0.936, F1 score of 0.93, and Matthews Correlation Coefficient (MCC) of 0.893 for the dataset of Lalpura and Accuracy of 0.991, F1 Score of 0.991 and MCC of 0.981 respectively for the dataset of Heingang. Conclusion: This study explores a method for predicting water quality that combines easy and feasible water quality measurements with machine learning. The stack model classifier performed best for multiclass classification, according to this study. To ensure that the highest quality of water is given throughout the year, information from this study will motivate researchers to look into the underlying root causes of the quality variations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.