Large-scale k-means clustering via variance reduction

Yawei Zhao,Jianping Yin,Kaikai Zhao,En Zhu,Yuewei Ming,Xinwang Liu

doi:10.1016/j.neucom.2018.03.059

Abstract

Abstract With the increase of the volume of data such as images in web, it is challenging to perform k-means clustering on millions or even billions of images efficiently. One of the reasons is that k-means requires to use a batch of training data to update cluster centers at every iteration, which is time-consuming. Conventionally, k-means is accelerated by using one or a mini-batch of instances to update the centers, which leads to a bad performance due to the stochastic noise. In the paper, we decrease such stochastic noise, and accelerate k-means by using variance reduction technique. Specifically, we propose a position correction mechanism to correct the drift of the cluster centers, and propose a variance reduced k-means named VRKM. Furthermore, we optimize VRKM by reducing its computational cost, and propose a new variant of the variance reduced k-means named VRKM++. Comparing with VRKM, VRKM++ does not have to compute the batch gradient, and is more efficient. Extensive empirical studies show that our methods VRKM and VRKM++ outperform the state-of-the-art method, and obtain about 2 × and 4 × speedups for large-scale clustering, respectively. The source code is available at https://www.github.com/YaweiZhao/VRKM_sofia-ml .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large-scale k-means clustering via variance reduction

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: May 7, 2018
Citations: 21

Similar Papers

Monte Carlo simulation of radiological imaging systems and the recovery of the Poisson distribution
J Peter ... R.J Jaszczak
-
J Peter, et. al.J Peter ... R.J Jaszczak
15 Oct 2000
15 Oct 2000

Research on Shielding Deep Penetration Calculation Based on MC Variance Reduction Techniques
Aikou Sun ... Zhenping Chen
-
Aikou Sun, et. al.Aikou Sun ... Zhenping Chen
01 Jan 2023
01 Jan 2023

Simulations of communication systems via integrated variance reduction techniques
A Bohdanowicz ... J.H Weber
-
A Bohdanowicz, et. al.A Bohdanowicz ... J.H Weber
22 Apr 2003
22 Apr 2003

Variance reduction techniques: Experimental comparison and analysis for single systems
Ihsan Sabuncuoglu ... Sabri Çelik
IIE Transactions | VOL. 40
Ihsan Sabuncuoglu, et. al.Ihsan Sabuncuoglu ... Sabri Çelik
10 Mar 2008
IIE Transactions | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale k-means clustering via variance reduction

Abstract

Talk to us

Similar Papers

More From: Neurocomputing