Abstract

Density-based clustering methods have achieved many applications in data mining, whereas most of them still likely suffer poor performances on data sets with extremely uneven distributions, like the manifold or ring data. The paper proposes a novel method for clustering with local peaks in the symmetric neighborhood. Local peaks are points with maximum densities at the local level. During the searching of local peaks, all data, except those outliers, can be easily divided into a number of small clusters in accordance with the local peaks in each point's neighborhood. Especially, a graph-based scheme is adopted here to merge similar clusters based on their similarity in the symmetric neighborhood graph, followed by assigning each outlier to the closest cluster. A variety of artificial, real data sets and a real building data set have been tested for clustering by the proposed method and compared against other popular density-based methods and other algorithms.

Highlights

  • Clustering is an indispensable and fundamental method for data mining, which attempts to classify data objects into categories or clusters on the basis of their similarity

  • When getting the symmetric neighborhood, it requires computing the local density of each point and dividing the data set by searching local peaks, the time complexity is O(lN ) and O(lN 2) respectively, especially l is the number of points in symmetric neighborhood of point i

  • The results show that the Acc and Normalized Mutual Information (NMI) scores of LP-SNG are higher than AP, DBSCAN, DPC and k-means algorithms on most data sets

Read more

Summary

INTRODUCTION

Clustering is an indispensable and fundamental method for data mining, which attempts to classify data objects into categories or clusters on the basis of their similarity. For the sake of achieving a parameter-free clustering technique, kNN-DBSCAN [5] introduces k-nearest neighbors graph into DBSCAN where k > 0 Another typical density-based method is the density peak clustering (DPC [14]) algorithm, which uses local density and distance from points of higher density to measure. Z. Liu et al.: Local Peaks-Based Clustering Algorithm in Symmetric Neighborhood Graph whether a point is a cluster center. To solve the problem that the existing density-based algorithm like DBSCAN and DPC is not friendly to the manifold data with different densities, we propose a new algorithm based on local peaks in symmetric neighborhood graph, called LP-SNG. The main contribution of this article is that we propose a new density-based algorithm to fit manifold data with different densities and we consider a new perspective namely local peaks in symmetric neighbors rather than global peaks in DPC and DPC-SNR.

LOCAL DENSITY
LOCAL PEAKS IN SYMMETRIC NEIGHBORHOOD GRAPH
EXPERIMENTS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.