Abstract

Spectral clustering is an effective clustering method for its excellent performance in partitioning non-linearly distributed data. Nevertheless, it suffers from scalability due to its high computational complexity in constructing the Laplacian graph and computing the corresponding eigendecomposition. In the past decades, many efforts have been made to face this problem. However, the computational bottleneck is still a problem in processing extensive data, especially in web-scale scenarios. We present LiteWSC, a simple yet efficient lightweight spectral clustering framework, to cluster web-scale data with limited resource requirements. Our framework has minimal space overhead and does not require explicit overall embeddings computation. We also analyze the theoretical guarantee and performance boundary of LiteWSC. LiteWSC is highly flexible with O(sp) memory requirement, where s and p are the number of samples and the number of prototypes, which are adaptive to the available resource. Therefore, LiteWSC can partition web-scale data (e.g., $$n = 8,000k$$ ) in an resource-limited host (e.g., memory is restricted to 1 GB). Experiments on real-world, large-scale and web-scale datasets demonstrate both the efficiency and effectiveness of LiteWSC over state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call