Sparse-reduced computation for large-scale spectral clustering

Philipp Baumann

doi:10.1109/ieem.2016.7798085

Abstract

Clustering is a fundamental task in machine learning and data analysis. A large number of clustering algorithms has been developed over the past decades. Among these algorithms, the recently developed spectral clustering methods have consistently outperformed traditional clustering algorithms. Spectral clustering algorithms, however, have limited applicability to large-scale problems due to their high computational complexity. We propose a new approach for scaling spectral clustering methods that is based on the idea of replacing the entire data set with a small set of representative data points and performing the spectral clustering on the representatives. The main contribution is a new approach for efficiently identifying the representative data points. First results indicate that the proposed scaling approach achieves high-quality clusterings and is substantially faster than existing scaling approaches.

Full Text