Water is one of the world's most precious resources, essential to life. Industrial waste, agricultural runoff, and urban discharge degrade water, rendering it unfit for consumption. Water quality monitoring and evaluation are more important than ever. Big Data analytics is used to examine water quality utilizing enormous datasets of pH, hardness, solids concentration, chloramine, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. This work classifies water potability, which is vital for human consumption, using strong machine learning on massive datasets. Classifiers were Random Forest, Gradient Boosting, and Support Vector Machine on 3,276 water bodies. The Random Forest classifier obtained the highest accuracy at 66.77% after significant data preparation and training, followed by Gradient Boosting at 66.01% and SVM at 62.80%. This shows that Big Data analytics and machine learning algorithms can interpret complex water quality data for public health and natural resource management. The Random Forest classifier and SVM in this study accurately calculate water potability. Prediction algorithms consider water cleanliness data and may aid public safety and water resource monitoring.
Read full abstract