Abstract

Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.