Abstract

Accurate water quality prediction is crucial for effective environmental management and decision-making. However, previous studies have solely relied on historical data to simulate water quality, overlooking the potential discrepancies between predicted values and actual observations. Additionally, the opacity of machine learning models has posed challenges to the credibility of their predictions. Hence, considering the excellent nonlinear fitting ability of ensemble tree models, especially the Categorical Boosting (Catboost) model, this study proposes a knowledge-guided Catboost (KGCatboost) model for predicting the dissolved oxygen concentration, one of the vital water quality indicators, in 15 river sections of the Yangtze River Basin in Yunnan Province, China. Furthermore, to enhance the model’s interpretability, we employ the SHapley Additive exPlanations (SHAP) method to analyze the contributions of each input variable within the water body. The results demonstrate that on the test set of each dataset, the mean Nash-Sutcliffe Efficiency (NSE) value of KGCatboost is 0.874, which has improved by 0.34% and 3.07% compared to Catboost and eXtreme Gradient Boosting (Xgboost). In addition, this study reveals that pH has the most significant impact on DO concentrations. Specifically, as the pH increased, the DO concentration increased significantly. A regulatory mechanism has also been developed to alleviate the hazards caused by low DO concentrations. The KGCatboost model can provide valuable guidance for water resource management processes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call