Clustering of Time Series Water Quality Data Using Dynamic Time Warping: A Case Study from the Bukhan River Water Quality Monitoring Network

Seulbi Lee,Jongyeon Hwang,Tae-Young Heo,Eunji Lee,Jungsu Park,Jaehoon Kim,Kyoung-Jin Lee,Jeongkyu Oh

doi:10.3390/w12092411

Seulbi Lee, Jongyeon Hwang + Show 6 more

Open Access

https://doi.org/10.3390/w12092411

Copy DOI

Abstract

It is essential to monitor water quality for river water management because river water is used for various purposes and is directly related to the health and safety of a population. Proper network installation and removal is an important part of water quality monitoring and network operation efficiency. To do this, cluster analysis based on calculated similarity between measuring stations can be used. In this study, we measured the similarities between 12 water quality monitoring stations of the Bukhan River. River water quality data always have a station-dependent time lag because water flows from upstream to downstream; therefore, we proposed a Dynamic Time Warping (DTW) algorithm that searches for the minimum distance by changing and comparing time-points, rather than using the Euclidean algorithm, which compares the same time-point. Both Euclidean and DTW algorithms were applied to nine water quality variables to identify similarities between stations, and K-medoids cluster analysis were performed based on the similarity. The Clustering Validation Index (CVI) was used to select the optimal number of clusters. Our results show that the Euclidean algorithm formed clusters by mixing mainstream and tributary stations; the mainstream stations were largely divided into three different clusters. In contrast, the DTW algorithm formed clear clusters by reflecting the characteristics of water quality and watershed. Furthermore, because the Euclidean algorithm requires the lengths of the time series to be the same, data loss was inevitable. As a result, even where clusters were the same as those obtained by DTW, the characteristics of the water quality variables in the cluster differed. The DTW analysis in this study provides useful information for understanding the similarity or difference in water parameter values between different locations. Thus, the number and location of required monitoring stations can be adjusted to improve the efficiency of field monitoring network management.

Highlights

River water is used for various purposes and is directly related to the health and safety of a population
This study aimed to measure similarities between water quality data measured at different water quality monitoring stations by using the Dynamic Time Warping (DTW) algorithm to perform cluster analysis; the results were compared with those clustered using the Euclidean algorithm
The number of clusters was set in advance from two to five, and the optimal number of clusters was determined based on the Clustering Validation Index (CVI) (Table 4)

Summary

Introduction

River water is used for various purposes (e.g., human consumption, agricultural irrigation) and is directly related to the health and safety of a population. It is essential to monitor water quality for river water management. To this end, the Ministry of Environment of the Republic of Korea has installed water quality monitoring networks along rivers nationwide. As the number of measurement stations increases, the time and cost of data analysis has increased. It is increasingly important to operate optimal water quality monitoring networks, including the efficient selection and removal of water quality measurement stations. It is possible to reduce costs by grouping stations with similar water quality characteristics into clusters using cluster analysis, and measuring the water quality by selecting a representative point in each cluster

Objectives

Results

Discussion

Conclusion