Abstract

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).

Highlights

  • [11, 12] provided several methods for selecting the initial clustering center and improving the accuracy of clustering

  • Adding attributes related to neighbors in the clustering process can help to make a correct judgment. erefore, we introduce the concept of shared-nearest neighbor (SNN) proposed in [22], when defining the local density and the relative distance

  • In order to visually observe the clustering ability of density peak clustering algorithm (DPC)-SFSKNN, the DPC [20], density-based spatial clustering of applications with noise (DBSCAN) [15], affinity propagation (AP) [8], FKNN-DPC [9], and K-means [10] methods are all tested for comparison. ree popular benchmarks are used to evaluate the performance of the above clustering algorithms, including the clustering accuracy (ACC), adjusted mutual information (AMI), and adjusted Rand index (ARI) [35]. e upper bounds of the three benchmarks were all 1. e larger the benchmark value, the better the clustering effect. e codes for DPC, DBSCAN, and AP were provided based on the corresponding references

Read more

Summary

Introduction

[11, 12] provided several methods for selecting the initial clustering center and improving the accuracy of clustering. To address the aforementioned problems, density-based clustering methods have been proposed, which can find clusters of various shapes and sizes in noisy data, where the Complexity high-density regions are considered as the clusters and separated by low-density regions [15,16,17,18,19]. FKNN-DPC uses the fuzzy weighted K-nearest neighbor technology to allocate the remaining points, and the SNN is based on whether the number of shared neighbors reaches the threshold to determine the cluster of the remaining points. Is paper proposed an improved clustering algorithm based on the density peaks (named as DPC-SFSKNN) It has the following new features: (1) the local density and the relative distance are redefined, and the distance attributes of the two neighbor relationships (KNN and SNN) are fused. It has to be ensured that the allocation strategy is fault tolerant

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call