Abstract

Subspace clustering addresses the problem of clustering a set of unlabeled high-dimensional data points lying near a union of low-dimensional subspaces according to their subspace membership. The number and dimensions of the subspaces, as well as their orientations, are all unknown. Since the computational cost of subspace clustering algorithms crucially depends on the ambient space dimension, it is desirable to reduce the dimensionality of the data before clustering. Even when computational cost is not an issue, dimensionality reduction is advantageous because it leads to reduced storage and transmission capacity requirements, and enhances privacy. It is thus important to understand the impact of dimensionality reduction on the performance of subspace clustering algorithms. In this work, we investigate this question analytically by deriving performance guarantees for sparse subspace clustering (SSC) [1–3] applied to data whose dimensionality was reduced using random projections. SSC relies on computing a sparse linear representation of each data point in terms of all other data points. The versions of SSC we consider are basis pursuit (BP)-, Lassoand orthogonal matching pursuit (OMP)-SSC, owing their names to the methods they use to find sparse representations. Our analytical results show that the dimensionality of the data can be reduced to the order of the dimensions of the subspaces without compromising the clustering performance, and reveal a tradeoff between the amount of dimensionality reduction tolerated and the affinities between the subspaces. Also, we numerically compare the effect of random projections and principal component analysis (PCA) on the clustering performance of SSC. In addition, we present a novel probabilistic performance guarantee for clustering the original data via OMP-SSC, ensuring correct clustering under very general conditions on the relative orientations of the subspaces. Finally, we numerically study a modification of Lasso-SSC aimed at accelerating the algorithm. This modification is based on Lasso screening [4] and does not substantially reduce the processing time of Lasso-SSC in our experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call