A Novel Density Based Clustering Algorithm and its Parallelization

Xiaokang Li,Binbin Yu,Yinghua Zhou,Guangzhong Sun

doi:10.1109/pdcat.2014.9

Abstract

K-Means, a simple but effective clustering algorithm, is widely used in data mining, machine learning and computer vision community. K-Means algorithm consists of initialization of cluster centers and iteration. The initial cluster centers have a great impact on cluster result and algorithm efficiency. More appropriate initial centers of k-Means can get closer to the optimum solution, and even much quicker convergence. In this paper, we propose a novel clustering algorithm, Kmms, which is the abbreviation of k-Means and Mean Shift. It is a density based algorithm. Experiments show our algorithm not only costs less initialization time compared with other density based algorithms, but also achieves better clustering quality and higher efficiency. And compared with the popular k-Means++ algorithm, our method gets comparable accuracy, mostly even better. Furthermore, we parallelize Kmms algorithm based on OPenMP from both initialization and iteration step and prove the convergence of the algorithm.

Full Text