An efficient k-means clustering algorithm using simple partitioning

Ming-Chuan Hung ,Jin-Hua Chang ,Jungpin Wu ,Don‐Lin Yang

doi:10.6688/jise.2005.21.6.4

Abstract

The k-means algorithm is one of the most widely used methods to partition a dataset into groups of patterns. However, most k-means methods require expensive distance calculations of centroids to achieve convergence. In this paper, we present an efficient algorithm to implement a k-means clustering that produces clusters comparable to slower methods. In our algorithm, we partition the original dataset into blocks; each block unit, called a unit block (UB), contains at least one pattern. We can locate the centroid of a unit block (CUB) by using a simple calculation. All the computed CUBs form a reduced dataset that represents the original dataset. The reduced dataset is then used to compute the final centroid of the original dataset. We only need to examine each UB on the boundary of candidate clusters to find the closest final centroid for every pattern in the UB. In this way, we can dramatically reduce the time for calculating final converged centroids. In our experiments, this algorithm produces comparable clustering results as other k-means algorithms, but with much better performance.

Full Text