The systematic surveillance of nutrients and organic pollution in urban rivers is crucial for enhancing ecological integrity and promoting societal and economic sustainability. Currently, the primary methods of water quality monitoring involve on-site sampling and laboratory analysis, which are constrained by various factors such as terrain and climate. Remote sensing water quality monitoring, which enables large-scale, periodic, and comprehensive coverage, serves as an important supplement to these traditional methods. However, most current research on water quality monitoring predominantly relies on remote sensing technology, often overlooking the application of other multi-source data. In this study, we examined rivers in the Weihe River Basin by integrating field samples, Sentinel-2 multispectral imagery, meteorological elements, and land use types to construct machine learning (ML) models for predicting four water quality parameters (WQPs): ammonia nitrogen (NH3-N), total phosphorus (TP), chemical oxygen demand (COD), and dissolved oxygen (DO). The results showed that land use types significantly influenced the accuracy of predictions for NH3-N, TP, COD, and DO. Among the models evaluated, the Extra Tree Regression (ETR), eXtreme Gradient Boosting (XGBoost), and Gradient Boosting Regression (GBR) demonstrated the highest accuracy and transferability for monitoring WQPs in rivers. For instance, the models achieved the following coefficients of determination (R2) in 5-fold cross-validation: for NH3-N, R2 was 0.65 in both the testing and validation datasets; for TP, R2 was 0.71 and 0.68; for COD, R2 was 0.50 and 0.47; and for DO, R2 was 0.68 and 0.64, respectively. Therefore, our findings underscore the feasibility of using multi-source data and ML methods to quantify water pollutants in urban rivers, providing essential technical support for monitoring the spatiotemporal dynamics of river water quality across extensive geographical areas.
Read full abstract