Abstract

Spectral Clustering (SC) is an effective clustering method for its excellent performance in partitioning non-linearly distributed data. On the other hand, Ensemble Clustering (EC), a different clustering technology, can promote cluster quality by ensembling the results of base clusterings. In this work, we concentrate on an EC framework that utilizes SC as the base method. Nevertheless, SC suffers from scalability due to its high computational complexity in constructing the Laplacian graph and computing the corresponding eigendecomposition. In the past decades, many efforts have been made to it. However, SC suffers from the scalability issue in processing extensive data, especially in web-scale scenarios. Additionally, EC requires multiple clustering results as the ensemble bases, which further aggravates resource consumption. To address this issue, LiteWSEC, a simple yet efficient Lightweight Framework for Web-scale Spectral Ensemble Clustering, is proposed to cluster web-scale data with limited resource requirements. It adopts the Web-scale Spectral Clustering (WSC) as the base method, which has minimal space overhead without computing overall embedding explicitly. LiteWSEC is highly flexible in the memory requirement, which is adaptive to the available resource. It can partition web-scale data (e.g., <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$n = 8,000~k$</tex-math></inline-formula> ) in an resource-limited host (e.g., memory is restricted to 1 GB). Experiments on real-world, large-scale, and web-scale datasets demonstrate both the efficiency and effectiveness of LiteWSEC over state-of-the-art SC and EC methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call