Online hydrological and water quality monitoring data has become increasingly crucial for water environment management such as assessment and modeling. However, online monitoring data often contains erroneous or incomplete datasets, consequently affecting its operational use. In the study, we developed an automated data cleaning algorithm grounded in Seasonal-Trend decomposition using Loess (STL) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). STL identifies and corrects more obvious anomalies in the time series, followed by DBSCAN for further refinement, in which the reverse nearest neighbor method was employed to enhance the clustering accuracy. To improve anomaly detection, a two-level residual judgment threshold was applied. The algorithm has been successfully applied to three reservoirs in Shanghai, China, achieving the precision rate of 0.91 and recall rate of 0.81 for dissolved oxygen and pH. The proposed algorithm can be potentially applied for cleaning of environment monitoring data with high accuracy and stability.
Read full abstract