Multiple kernel clustering (MKC) has recently achieved remarkable progress in fusing multisource information to boost the clustering performance. However, the O(n2) memory consumption and O(n3) computational complexity prohibit these methods from being applied into median- or large-scale applications, where n denotes the number of samples. To address these issues, we carefully redesign the formulation of subspace segmentation-based MKC, which reduces the memory and computational complexity to O(n) and O(n2) , respectively. The proposed algorithm adopts a novel sampling strategy to enhance the performance and accelerate the speed of MKC. Specifically, we first mathematically model the sampling process and then learn it simultaneously during the procedure of information fusion. By this way, the generated anchor point set can better serve data reconstruction across different views, leading to improved discriminative capability of the reconstruction matrix and boosted clustering performance. Although the integrated sampling process makes the proposed algorithm less efficient than the linear complexity algorithms, the elaborate formulation makes our algorithm straightforward for parallelization. Through the acceleration of GPU and multicore techniques, our algorithm achieves superior performance against the compared state-of-the-art methods on six datasets with comparable time cost to the linear complexity algorithms.
Read full abstract