Abstract
Spectral clustering algorithm (SC) has recently received great attention for its high performance in large-scale data clustering and simplicity in implementation. However, previous studies have demonstrated that the commonly used distance measures in SC cannot simultaneously consider global and local consistencies and are sensitive to various noises. As a result, the obtained similarity matrices are unable to capture the actual data structure and thus produce poor clustering results, especially when the data exhibits nonlinear and local manifold structures characteristics. In order to address those limitations, we present a spectral clustering algorithm using density-sensitive distance measure with global and local consistencies in this paper. In the presented algorithm, a novel manifold distance with exponential term and scaling factor is introduced as the pairwise similarity measure. By modifying its exponential term and scaling factor, we can flexibly adjust the ratio of the similarities between the data within the same manifold to the similarities between the data across different manifolds. This flexible adjustment is beneficial to the obtained similarity matrix more accurately reflecting the global and local consistencies of data structures. In addition, to eliminate the effect of noises on the clustering performance, we also incorporate the relative density sensitive term into the distance measure to take into account the local distribution characteristics of the data. Finally, to further improve clustering performance, we provide the SC-based k value determination method for k nearest neighbors (KNN) graph. In the experimental part, the effect of parameters on the performance of the proposed technique is discussed and some suggestions about the determination of the parameters are given. Theoretical analysis and experimental results on several synthetic datasets, UCI benchmark datasets and generated large MNIST handwritten digits datasets demonstrate that the proposed approach is superior to other existing spectral clustering techniques with good robustness.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have