Abstract

Recently the density peaks clustering algorithm (DPC) has received a lot of attention from researchers. The DPC algorithm is able to find cluster centers and complete clustering tasks quickly. It is also suitable for different kinds of clustering tasks. However, deciding the cutoff distance {d}_{c} largely depends on human experience which greatly affects clustering results. In addition, the selection of cluster centers requires manual participation which affects the efficiency of the algorithm. In order to solve these problems, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (KNN-ADPC). A clusters merging strategy is proposed to automatically aggregate over-segmented clusters. Additionally, the K nearest neighbors are adopted to divide data points more reasonably. There is only one parameter in KNN-ADPC algorithm, and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC, and DPC-KNN.

Highlights

  • Clustering algorithm is one of the most important machine learning algorithms which has been widely applied in many fields, such as data mining and chemical industry

  • We propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (KNN-ADPC)

  • Considering that there may be no labels in some actual clustering tasks, the performance can be evaluated according to whether the clustering result is consistent with similar points clustered into the same cluster while low similarity points are divided into different clusters

Read more

Summary

Introduction

Clustering algorithm is one of the most important machine learning algorithms which has been widely applied in many fields, such as data mining and chemical industry. Many new clustering algorithms have been proposed, such as spectral clustering [5], multi-kernel clustering [6], multi-view clustering [7], subspace clustering [8], ensemble clustering [9], and deep embedded clustering [10] The drawback of these new clustering algorithms is that both complexity and computation costs are larger than classical clustering algorithms. The clustering result is vulnerable to the selection of initial center points while K-means++ [13] can partially solve this problem. Both K-means and K-means++ are inadequate in dealing with the non-spherical cluster. The basic idea of DBSCAN is that clusters are decided according to the density connection relationship

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call