Abstract

Aiming at density peaks clustering needs to manually select cluster centers, this paper proposes a fast new clustering method with auto-select cluster centers. Firstly, our method groups the data and marks each group as core or boundary groups according to its density. Secondly, it determines clusters by iteratively merging two core groups whose distance is less than the threshold and selects the cluster centers at the densest position in each cluster. Finally, it assigns boundary groups to the cluster corresponding to the nearest cluster center. Our method eliminates the need for the manual selection of cluster centers and improves clustering efficiency with the experimental results.

Highlights

  • Clustering [1,2,3,4] is an unsupervised or semisupervised learning method. is method aims at dividing the samples into different clusters according to the similarity between samples so that the samples in the same cluster are as similar as possible and the samples in different clusters are as dissimilar as possible

  • We find that the cluster centers usually appear in the densest place of each cluster

  • density peaks clustering (DPC) needs to use the density of the sample (ρ) and the distance between the sample and its nearest highdensity sample (δ) when finding the cluster centers and determining the sample labels

Read more

Summary

Introduction

Clustering [1,2,3,4] is an unsupervised or semisupervised learning method. is method aims at dividing the samples into different clusters according to the similarity between samples so that the samples in the same cluster are as similar as possible and the samples in different clusters are as dissimilar as possible. Aiming at the high time complexity of DPC, Lu et al [15] proposed a fast distributed density peaks clustering method based on the Z-value index. In [20], introducing natural neighbor to find local representations, calculating the adaptive distance between local representations effectively reduces the runtimes It cannot automatically determine the cluster centers. Du et al [21] proposed a k-nearest neighbor DPC method based on principal component analysis It uses k-nearest neighbor to calculate the sample density and principal component analysis to process high-dimensional data. DPC needs to use the density of the sample (ρ) and the distance between the sample and its nearest highdensity sample (δ) when finding the cluster centers and determining the sample labels. Based on the above three parts, the time complexity of DPC is O(n2)

The Proposed Method
Precision Experiment
Efficiency Experiment
Application
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.