Abstract

The minimum spanning tree- (MST-) based clustering method can identify clusters of arbitrary shape by removing inconsistent edges. The definition of the inconsistent edges is a major issue that has to be addressed in all MST-based clustering algorithms. In this paper, we propose a novel MST-based clustering algorithm through the cluster center initialization algorithm, called cciMST. First, in order to capture the intrinsic structure of the data sets, we propose the cluster center initialization algorithm based on geodesic distance and dual densities of the points. Second, we propose and demonstrate that the inconsistent edge is located on the shortest path between the cluster centers, so we can find the inconsistent edge with the length of the edges as well as the densities of their endpoints on the shortest path. Correspondingly, we obtain two groups of clustering results. Third, we propose a novel intercluster separation by computing the distance between the points at the intersection of clusters. Furthermore, we propose a new internal clustering validation measure to select the best clustering result. The experimental results on the synthetic data sets, real data sets, and image data sets demonstrate the good performance of the proposed MST-based method.

Highlights

  • Clustering aims to group a set of objects into clusters such that the objects of the same cluster are similar, and objects belonging to different clusters are dissimilar

  • We propose a novel minimum spanning tree (MST)-based clustering algorithm through the cluster center initialization algorithm, called cciMST

  • The rest of this paper is organized as follows: in Section 2, we review some existing work on minimum spanning tree- (MST-)based clustering algorithms

Read more

Summary

Introduction

Clustering aims to group a set of objects into clusters such that the objects of the same cluster are similar, and objects belonging to different clusters are dissimilar. The different clustering methods, such as partitional, hierarchical, density-based, and grid-based approaches, are not completely satisfactory due to the multiplicity of problems and the data distributions [2,3,4]. As a well-known partitional clustering algorithm, the K-means algorithm often assumes a spherical shape structure of the underlying data, and it can detect clusters with irregular boundaries. DBSCAN is a classical densitybased clustering algorithm that can find clusters with arbitrary shapes. It needs to input four parameters which are difficult to determine [4]. The shape of the cluster boundary has little impact on the performance of the algorithm, which allows us to overcome the problems commonly faced by the classical clustering algorithms [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.