Stream-DBSCAN: A Streaming Distributed Clustering Model for Water Quality Monitoring

Chunxiao Mu,Yuxuan Wu,Jindong Zhao,Shouke Wei,Yanchen Hou

doi:10.3390/app13095408

Chunxiao Mu, Yuxuan Wu + Show 3 more

Open Access

https://doi.org/10.3390/app13095408

Copy DOI

Journal: Applied sciences	Publication Date: Apr 26, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: Ocean University of China, Yantai University

Abstract

With the increasing use of wireless sensor networks in water quality monitoring, an enormous amount of streaming data is generated by widely deployed sensors. However, the current batch mode used for data analysis can no longer meet the diverse combination of monitoring indicators and the requirement for timely analysis results on an all-weather basis. To overcome these challenges and analyze a large amount of water quality data quickly and accurately, we propose a stream-DBSCAN distributed stream processing clustering model. First, real-time data streams are processed using the distributed stream computing framework Flink. Then, the DBSCAN clustering algorithm is applied to cluster each dataset as a different dimension of the cluster. Finally, the time distribution characteristics of the data in the same cluster are analyzed to identify the water quality variation rules. The system can extract data noise points and identify sudden deterioration of water quality. We tested the model using datasets on three water quality indices, pH, ammonia nitrogen (NH4N), and turbidity, in the Yantai Menlou Reservoir from May to August 2019. The results demonstrate that the system can efficiently and quickly perform cluster analysis on streaming data. By analyzing the clustering results, we found that the daily variation of water quality and sudden pollution events in the Menlou Reservoir are consistent with the actual situation.

Full Text