Abstract

Similarity calculation is one of the most critical steps of clustering analysis, especially for arbitrarily formed elongated structures. When it comes to Density Peak Clustering (DPC), using Euclidean distance solely to calculate the similarity also makes it suffer arbitrarily formed data clustering. To tackle this deficiency of DPC, an improved Connectivity Kernel (ICK) was presented to accelerate Connectivity Kernel and help DPC identify clusters with arbitrarily formed structures, which mainly consist of two strategies:(i) Because that Connectivity Kernel suffers from outliers between two clusters if their density is as high as the backbone of the clusters, ICK firstly extracts local centers according to local density and relative location of points, which can eliminate most outliers and boundary points without breaking the original distribution of data. Thus, not only the adverse impact of outliers can be avoided, many meaningless calculations time can also be saved;(ii) ICK defines the connection between two local centers as their dissimilarity according to Connectivity Kernel. Differently, instead of traversing the entire dataset, ICK only focus on several specific path between two local centers to evaluate their connectivity, which further reduces the computational complexity of the algorithm to O(nlogn).Experiments on synthetic and real-world datasets demonstrate the effectiveness and robustness of the proposed algorithm in practical application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call