Abstract
Cluster analysis, which is to partition a dataset into groups so that similar elements are assigned to the same group and dissimilar elements are assigned to different ones, has been widely studied and applied in various fields. The two challenging tasks in clustering are determining the suitable number of clusters and generating clusters of arbitrary shapes. This paper proposes a new concept of “epsilon radius neighbors” which plays an essential role in the cluster-forming process, thereby determining both the number of clusters and the shape of clusters, automatically. Based on “epsilon radius neighbors,” a new clustering algorithm in which the epsilon radius value is adapted to the characteristics of each cluster in the current partition is proposed. Recently, clustering has been widely applied in environmental applications, including underground water quality monitoring. However, the existing studies have simply applied conventional clustering techniques, in which the abovementioned two challenging tasks have not been solved already. Therefore, in this paper, the proposed clustering algorithm is applied in assessing the underground water quality in Phu My Town, Ba Ria-Vung Tau Province, Vietnam. The experimental results on benchmark datasets demonstrate the effectiveness of the proposed algorithm. For the quality of underground water, the new algorithm results in four clusters with different characteristics. Through this application, we found that the new algorithm might provide valuable reference information for underground water management.
Highlights
Cluster analysis is to discover the underlying structure of a dataset by partitioning the data into groups so that similar elements are assigned to the same group and dissimilar elements are assigned to different ones [1,2,3,4,5]
The k-means algorithm and its extensions usually require a user-defined number of clusters that is often unknown in practice. (i) the k-means algorithm constructs spherical clusters, which is unsuitable for arbitrary-shaped clusters. (ii) e above two problems have been the major drawbacks of clustering so far, which lead to many difficulties and challenges in solving this problem [6]
For (i), to determine the suitable number of clusters, the most commonly used approach is running the clustering algorithm several times with different number of clusters each time, and evaluating them based on a number of Scientific Programming internal validity measures, such as S-index, F-index, Dunn index, and Xie-Beni index [11,12,13,14]. is approach can investigate the suitable number of clusters, but it repeats the clustering process many times to find the best number of clusters, thereby increasing the amount of time and space required, according to [6]
Summary
Cluster analysis is to discover the underlying structure of a dataset by partitioning the data into groups so that similar elements are assigned to the same group and dissimilar elements are assigned to different ones [1,2,3,4,5]. E term “parameter free” means that the algorithm can automatically determine the number of clusters without requiring any user-defined parameters. For this purpose, PFClust performs an agglomerative algorithm on many subdatasets that are randomly sampled several times. In spite of outputting the number of clusters and partitioning automatically, the metaheuristic optimization method requires a few of its own user-defined parameters that have effects on the optimal solution. Erefore, in this paper, the proposed clustering algorithm is applied in assessing the underground water quality in Phu My Town, Ba Ria-Vung Tau Province, Vietnam.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.