Improvement and Parallelism of k-Means Clustering Algorithm

Jinlan Tian,Lin Zhu,Suqin Zhang,Lu Liu

doi:10.1016/s1007-0214(05)70069-9

Jinlan Tian, Lin Zhu + Show 2 more

Open Access

https://doi.org/10.1016/s1007-0214(05)70069-9

Copy DOI

Journal: Tsinghua Science & Technology	Publication Date: Jun 1, 2005
Citations: 41	License type: implied-oa

Affiliation: Tsinghua University

Abstract

The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms.

Full Text