Abstract

A novel density-based clustering algorithm, called Density Peak Clustering (DPC), has recently received great attention due to its efficiency in clustering performance and simplicity in implementation. However, empirical studies have demonstrated that the commonly used distance measures in DPC cannot simultaneously consider global and local consistency, which can cause the estimated local densities based on it incapable of capturing the ground-truth data structure and thus produce poor clustering results, especially when the clusters existing in datasets exhibit multi-density manifold structures characteristics with different sizes. In order to address those limitations, we propose a novel density peak clustering algorithm using global and local consistency adjustable manifold distance in this paper. In the proposed algorithm, a novel manifold distance with exponential term and scaling factor is introduced to estimate local densities of all data points. By modifying its exponential term and scaling factor, we can flexibly adjust the ratio of the distance between the data within the same manifold to the distance between the data across different manifolds. This flexible adjustment is beneficial to the estimated local densities more accurately reflecting the global and local consistency of data structures. In addition, to effectively deal with clusters with different densities and sizes, a compensation strategy for distance from nearest point with larger density, called local-scale tuning distance, is developed for our proposed approach. By the developed local-scale tuning distance, underlying cluster centers of clusters with different densities and sizes, especially the clusters with low densities or small sizes can remarkably stand out from the decision graph so that the proposed method can accurately identify the number of underlying clusters in the decision graph and thus obtain satisfactory clustering results. In the experimental part, the effect of the scaling factor on the performance of the proposed technique is discussed and some suggestions about the determination of the parameters are given. Theoretical analysis and experimental results on several synthetic datasets and read-world datasets demonstrate that the proposed approach is superior to other existing clustering techniques in terms of three evaluation metrics with statistical significance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call