Abstract

High-dimensional data has attracted much attention because it contains more comprehensive information about samples. How to cluster these high-dimensional data has become a crucial topic in unsupervised learning. Existing clustering methods often show limited applicability due to their high computational complexity and low anti-noise ability. To address this issue, we propose a novel robust landmark graph-based clustering algorithm for high-dimensional data (RLGCH), which inherits the advantages of both k-means++ and graph-based clustering by using the results of k-means++ as pseudo labels for landmark graph-based clustering. In particular, RLGCH can achieve more reasonable clustering effectiveness than methods that just operate in the low-dimensional space or the original space since it performs k-means++ in the low-dimensional space and landmark graph-based spectral clustering in the original feature space. To avoid post-processing after optimization, the embedded factor matrix is constrained as an indicator matrix rather than a simple nonnegative matrix. To enhance the clustering robustness, the L2,1-norm is adopted to minimize the error of results between k-means++ and landmark graph-based clustering. To solve the model of RLGCH, we established a novel efficient optimization strategy to obtain all sample categories directly. Combining our clustering model and optimization strategy, the computational complexity is reduced to linear and insensitive to data dimensions. Extensive experiments on seven real-world datasets and sixteen noisy datasets show that compared with other state-of-the-art methods, RLGCH can improve the clustering efficiency and robustness greatly while guaranteeing comparable or even better clustering effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call