Spatially adaptive machine learning models for predicting water quality in Hong Kong

Qiaoli Wang,Zijun Li,Jiannan Cai,Mengsheng Zhang,Zida Liu,Yu Xu,Rongrong Li

doi:10.1016/j.jhydrol.2023.129649

Abstract

Water quality prediction in the spatially heterogeneous environment is challenging as the importance of water quality parameters (WQPs) and the performance of prediction models may vary across space. Thus, this study proposed spatially adaptive machine learning models to predict water quality status in Hong Kong. First, spatial clusters with relatively homogeneous water quality were adaptively detected using dynamically constrained agglomerative clustering and partitioning. Then, the optimal prediction models were constructed for each cluster by locally evaluating the prediction performance of six standalone machine learning models, including multi-layer perceptron neural network (MLPNN), support vector machine (SVM), random forest (RF), extremely randomized tree (ET), eXtreme gradient boosting (XGBoost) and categorical gradient boosting (CatBoost), as well as four novel hybrid models (MLPNN-SVM, ET-CatBoost, MLPNN-CatBoost and XGBoost-CatBoost). Finally, a sensitivity analysis was conducted to explore the minimum sets of indicative WQPs to achieve more cost-efficient water quality prediction based on locally optimal prediction models. The results revealed that the water quality in the study area was spatially heterogeneous and four spatially contiguous clusters were identified. MLPNN-SVM, ET-CatBoost, MLPNN-CatBoost and CaBboost performed best in Cluster 1 to Cluster 4, with R2 values of 0.917, 0.906, 0.901 and 0.937 and RMSE values of 1.978, 0.843, 2.020 and 1.572, respectively. The results of the sensitivity analysis indicated that acceptable local prediction results can be obtained using fewer WQPs. It is conducive to issuing timely water quality warnings and striving for more time for water pollution remediation.

Full Text