Abstract

Recently a novel density-based clustering algorithm , namely, border-peeling (BP) clustering algorithm, is proposed to group data by iteratively identifying border points and peeling off them until separable areas of data remain. The BP clustering is able to correctly recognize the true structure of clusters and automatically detect the outliers on several test cases. However, there are some drawbacks in BP, and these may hinder its widespread application. The BP clustering might yield bad results on datasets with non-uniformly-distributed clusters. Especially, the BP clustering tends to over-partition the data with complex shape. To overcome these defects, a robust border-peeling clustering algorithm (named as ROBP) is proposed in this paper. Our method improves the BP clustering algorithm from two aspects: density influence (i.e. density estimation) and linkage criterion (i.e. association strategy). In density estimation, we use Cauchy kernel with longer tails instead of Gaussian kernel in the local scaling function, and further propose a kernel density estimator , i.e., the density estimator based on Cauchy kernel. It can calculate quickly and accurately the density influence value of each point. In association strategy, we design a linkage criterion based on the shared neighborhood information. The linkage criterion can create some links between peeled border points and their neighboring peeled border points, in order to avoid over-segmentation of the clusters. We integrate the proposed linkage criterion and the uni-directional association strategy, and further propose a bi-directional association strategy. In experiments, we compare ROBP with 7 representative density-based clustering (or hierarchical clustering) algorithms, including BP, DBSCAN, HDBSCAN, density peak (DP) clustering, DPC-KNN, DPC-DBFN and McDPC, on 8 synthetic datasets and 11 real-world datasets. Results show that the proposed algorithm outperforms 7 competitors in most cases. Moreover, we compare the robustness of ROBP and BP, and evaluate their running time. Experimental results indicate that ROBP is much more robust and reliable, as well as it is competitive to BP in computational efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call