Parallel and accurate k‐means algorithm on CPU‐GPU architectures for spectral clustering

Guanlin He,Marc Baboulin,Stephane Vialle

doi:10.1002/cpe.6621

Abstract

Summaryk‐Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high‐quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k‐means algorithm as a preprocessing task to reduce the input data instances. We propose parallel optimization techniques for the k‐means algorithm on CPU and GPU. Particularly we use a two‐step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids. Our experiments on synthetic and real‐world datasets containing millions of instances exhibit a speedup up to 7 for the k‐means iteration time on GPU versus 20/40 CPU threads using AVX units, and achieve double‐precision accuracy with single‐precision computations.

Full Text