Abstract

Density peaks clustering (DPC) algorithm is a novel density-based clustering algorithm, which is simple and efficient, is not necessary to specify the number of clusters in advance, and can find any nonspherical class clusters. However, DPC relies heavily on the calculation methods of the cutoff distance threshold and local density and cannot analyze complex manifold data, especially datasets with uneven density distribution and multiple peaks in the same cluster. To solve these problems, we propose an improved density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging (LKSM_DPC). First, we redefine the local density calculation method using the layered k-nearest neighbors. To adapt to datasets with different densities, the k-nearest neighbors are divided into multiple layers. Second, for the multiple peaks in the same cluster problem, we design a new mechanism to calculate the similarity of subclusters based on the idea of shared neighbors and Newton's law of gravitation, and a subcluster merging strategy is proposed. To prove the effectiveness of our algorithm, we compare the LKSM_DPC with K-means, DBSCAN, DPC, and DPC derivatives for 24 datasets. A large number of experiments demonstrate that our algorithm can often outperform other algorithms.

Highlights

  • Clustering is one of the most important techniques in data mining

  • The literature has made contributions to the improvement of Density peaks clustering (DPC), there are still several problems: (1) most scholars used the idea of the k-nearest neighbors to calculate the local density, but few people considered the distribution of these k points, especially when the data density distribution is uneven; and (2) in a 2-D decision graph, it is difficult to determine the real cluster center, especially when there are multiple peaks in a cluster

  • To solve the above problems, in this paper, we proposed a novel density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging (LKSM_DPC)

Read more

Summary

INTRODUCTION

Clustering is one of the most important techniques in data mining. This technique gathers data with similar characteristics into a cluster, and there are significant differences among different clusters [1], [2]. Cheng et al [25] addressed the problem that DPC cannot process manifold datasets He proposed an improved density peaks clustering algorithm based on shared-neighbors between local cores (LORE-DP) and redefined natural neighbor-based density and the newly defined graph-based distance. The literature has made contributions to the improvement of DPC, there are still several problems: (1) most scholars used the idea of the k-nearest neighbors to calculate the local density, but few people considered the distribution of these k points, especially when the data density distribution is uneven; and (2) in a 2-D decision graph, it is difficult to determine the real cluster center, especially when there are multiple peaks in a cluster.

DENSITY PEAKS CLUSTERING ALGORITHM
DENSITY PEAKS CLUSTERING BASED ON THE K-NEAREST NEIGHBORS
FUZZY WEIGHTED K-NEAREST NEIGHBORS DENSITY PEAKS CLUSTERING
OUR ALGORITHM
SIMILARITY AND SUBCLUSTER MERGING
EXPERIMENTS
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call