ABSTRACT Water pollution remains a longstanding challenge globally, prompting substantial investment in water quality protection. The integration of advanced machine learning models offers promising avenues for accurate water quality prediction, enabling proactive measures to safeguard water sources. Presently, water quality assessment relies predominantly on physical and chemical metrics. This study developed MultiLayer Perceptron (MLP), eXtreme Gradient Boosting (XGBoost), long short-term memory (LSTM), and a hybrid CNN–LSTM model to forecast pH and dissolved oxygen (DO) levels. Results demonstrated the hybrid model's superior performance, with mean squared errors (MSEs) of 0.0015 and 0.0361 for pH and DO prediction, respectively. For Water Quality Classification (WQC), Random Forest (RF), k-Nearest Neighbors (kNNs), Support Vector Machine (SVM), and Light gradient Boosting Machine (Light GBM) were employed, with SVM achieving the highest accuracy at 88.75%. The research underscores the effectiveness of the CNN–LSTM model in predicting pH and DO levels. Leveraging these predictions as inputs to the SVM model offers valuable insights, particularly in regions where conventional monitoring methods face limitations. This streamlined approach, requiring only two parameters, signifies a significant advancement in accurate water quality prediction.
Read full abstract