Assessing water quality is essential for managing freshwater resources, safeguarding ecosystems, and guaranteeing public health. Traditional water quality assessment methods suffer from seasonal sampling, multi-parameter requirements, and labor-intensive sampling processes, which are major constraints for the frequent monitoring of vast river basins. To overcome this issue, the study modeled the remote sensing-based climatic and land use parameters with Principal Component Analysis (PCA) to leverage Artificial Neural Networks (ANN) and machine learning (ML) algorithms to predict the Water Quality Index (WQI). The Weighted Arithmetic Water Quality Index (WAWQI) method was used to calculate the WQI of the Godavari River Basin for the available 19 stream water quality parameters (SWQPs). Further, PCA was applied to reduce the dimensionality of the parameters from 19 to 6. These results led to the development of two modeling methods to predict the WQI. In the first method, the correlation-based model was developed to predict WQI by evaluating six SWQPs. The second method, the causal-effect model, uses land use and meteorological factors to determine WQI using causality. Using advanced AutoML techniques, the initial pool of 40 ML models was meticulously evaluated and refined, culminating in the selection of the top three exemplary models such as Extreme Gradient Boosting (XGB), Extra Trees (ET), and Random Forest (RF). In both methods, XGB models show better prediction, with the coefficient of determination (R2) value of 0.95 during training and 0.83 during testing in method one. Whereas in the second method, R2 of 0.93 in training and 0.80 in testing are obtained. Further, XGB, ET, and ANN outputs were stacked with each model to enhance these results in both methods. Among these three stacked models, the stacked ANN_ML model performed better compared to stacked XGB_ML and stacked ET_ML. In the first method, the stacked ANN_ML model predicts R2 values of 0.95 and 0.91 for training and testing. In the second method, 0.95 and 0.90 for training and testing are obtained using stacked ANN_ML model. These findings emphasize the stacked model prediction ability to capture nonlinear relationships in the parameters and the novel approach of land use and climate parameters based WQI prediction, which replace the laborious, time-consuming SWQP measurements.
Read full abstract