Clustering noisy, high-dimensional, and structurally complex data have always been a challenging task. As most existing clustering methods are not able to deal with both the adverse impact of noisy samples and the complex structures of data, in this article, we propose a novel robust and sparse possibilistic K-subspace (RSPKS) clustering algorithm to integrate subspace recovery and possibilistic clustering algorithms under a unified sparse framework. First, the proposed method sparsifies the membership matrix and the subspace projection vector under a dual-sparse framework to handle high-dimensional noisy data. This unifies dimensionality reduction and clustering using one objective function for which the optimization can be realized through synchronous iteration. Second, the reconstruction error of each sample in the local subspace is used as the distance metric for classification. That is, each sample itself is treated as a clustering prototype so as not to be affected by the structure of the overall data distribution. Therefore, the clustering prototype construction problem of the data with complex structures can be better addressed. Finally, to deal with nonlinear regions, our RSPKS method is further extended into a kernelized version, namely the kernelized RSPKS clustering algorithm. The experimental results on both synthetic and real-world datasets demonstrate that our proposed method outperforms state-of-the-art algorithms in terms of clustering accuracy.
Read full abstract