Abstract
Aiming at density peaks clustering needs to manually select cluster centers, this paper proposes a fast new clustering method with auto-select cluster centers. Firstly, our method groups the data and marks each group as core or boundary groups according to its density. Secondly, it determines clusters by iteratively merging two core groups whose distance is less than the threshold and selects the cluster centers at the densest position in each cluster. Finally, it assigns boundary groups to the cluster corresponding to the nearest cluster center. Our method eliminates the need for the manual selection of cluster centers and improves clustering efficiency with the experimental results.
Highlights
Clustering [1,2,3,4] is an unsupervised or semisupervised learning method. is method aims at dividing the samples into different clusters according to the similarity between samples so that the samples in the same cluster are as similar as possible and the samples in different clusters are as dissimilar as possible
We find that the cluster centers usually appear in the densest place of each cluster
density peaks clustering (DPC) needs to use the density of the sample (ρ) and the distance between the sample and its nearest highdensity sample (δ) when finding the cluster centers and determining the sample labels
Summary
Clustering [1,2,3,4] is an unsupervised or semisupervised learning method. is method aims at dividing the samples into different clusters according to the similarity between samples so that the samples in the same cluster are as similar as possible and the samples in different clusters are as dissimilar as possible. Aiming at the high time complexity of DPC, Lu et al [15] proposed a fast distributed density peaks clustering method based on the Z-value index. In [20], introducing natural neighbor to find local representations, calculating the adaptive distance between local representations effectively reduces the runtimes It cannot automatically determine the cluster centers. Du et al [21] proposed a k-nearest neighbor DPC method based on principal component analysis It uses k-nearest neighbor to calculate the sample density and principal component analysis to process high-dimensional data. DPC needs to use the density of the sample (ρ) and the distance between the sample and its nearest highdensity sample (δ) when finding the cluster centers and determining the sample labels. Based on the above three parts, the time complexity of DPC is O(n2)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.