Spectral clustering is one of the most important clustering approaches, often yielding performance superior to other clustering approaches. However, it is not scalable to large data sets in its original form due to the computational burden of the required large-matrix eigen-decomposition. In this paper, a two-step spectral clustering algorithm is introduced by extending recent advances of scalable spectral clustering based on low-rank affinity matrix using landmarks. In the first step, a scalable spectral clustering algorithm using raw landmark-based affinity matrix is adopted. In the second step, a novel low-rank affinity matrix is learnt via the probability density estimators, constructed from the estimated clusters as derived from the first step. Since the prior information on cluster labels can be utilised in the second step, this learnt affinity matrix reflects intrinsic pairwise data relationships much better. While the proposed more complicated algorithm results in a higher computational cost than the previous landmark-based spectral research, it can be shown that the associated computational cost still scales well with data size. It is demonstrated that the proposed algorithm is capable of achieving far superior performance than other state-of-the-art algorithms for several benchmark multi-class image data sets.
Read full abstract