Abstract

Sparse subspace clustering (SSC) is a state-of-the-art method for partitioning data points into the union of subspaces. However, it is not practical for large datasets as it requires solving a LASSO problem for each data point, where the number of variables in each LASSO problem is the number of data points. To improve the scalability of SSC, we propose to select a few sets of anchor points using a randomized hierarchical clustering method, and, for each set of anchor points, solve the LASSO problems for each data point allowing only anchor points to have a non-zero weight. This generates a multilayer graph where each layer corresponds to a set of anchor points. Using the Grassmann manifold of orthogonal matrices, the shared connectivity among the layers is summarized within a single subspace. Finally, we use k-means clustering within that subspace to cluster the data points, as done by SSC. We show on both synthetic and real-world datasets that the proposed method not only allows SSC to scale to large-scale datasets, but that it is also much more robust as it performs significantly better on noisy data and on data with close susbspaces and outliers, while it is not prone to oversegmentation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call