Abstract

Clustering is a task of finding natural groups in datasets based on measured or perceived similarity between data points. Spectral clustering is a well-known graph-theoretic approach, which is capable of capturing non-convex geometries of datasets. However, it generally becomes infeasible for analyzing large datasets due to relatively high time and space complexity. In this paper, we propose Multi-level Approximate Spectral (MAS) clustering to enable efficient analysis of large datasets. By integrating a series of low-rank matrix approximations (i.e., approximations to the affinity matrix and its subspace, as well as those for the Laplacian matrix and the Laplacian subspace), MAS achieves great computational and spacial efficiency. MAS provides a general framework for fast and accurate spectral clustering, which works with any kernels, various fast sampling strategies and different low-rank approximation algorithms. In addition, it can be easily extended for distributed computing. From a theoretical perspective, we provide rigorous analysis of its approximation error in addition to its correctness and computational complexity. Through extensive experiments we demonstrate superior performance of the proposed method relative to several well-known approximate spectral clustering algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call