Abstract

The true cluster number of the dataset in practical applications is rarely known in advance. Therefore, it is necessary to use a cluster validity index to evaluate the clustering results and determine the optimal cluster number. However, the performance of existing cluster validity indices is vulnerable to various factors such as cluster shape and density. To solve the above issues, this paper proposes a new cluster validity index based on augmented non-shared nearest neighbors (ANCV). The ANCV index is based on the following principles: (1) Within-cluster compactness can be measured by the distance between the pairs of data points with fewer shared nearest neighbors. (2) The distances between the pairs of data points at the intersection of clusters can be used to estimate the between-cluster separation. On this basis, the above point pairs are further extended to their augmented non-shared nearest neighbors, thereby forming small clusters. Then, the average distance within and between these clusters is calculated respectively to estimate the within-cluster compactness and between-cluster separation. Finally, the optimal number of clusters is determined by the difference between the between-cluster separation and the within-cluster compactness. Experimental results on both 12 two-dimensional synthetic datasets and 10 real datasets from UCI have shown that the ANCV index performs the best among all compared indices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call