R-Reference points based k-means algorithm

Ching-Lin Wang,Yung-Kuan Chan,Shao-Wei Chu,Shyr-Shen Yu

doi:10.1016/j.ins.2022.07.166

Abstract

k-Means algorithm is a simple and effective, but time consuming technique in clustering a great amount of data. In a traditional k-means algorithm the most significant part of the time cost of assigning a datum to a cluster with the shortest distance is spent to compute the distances between the datum and all cluster centers. There are many cluster centers relatively far away from the datum, though. The datum must not be an element of the distant clusters; therefore, it does not need to repeatedly calculate the distances between the datum and the distant cluster centers. In this research, an r-reference points based k-means (r-RPKM) algorithm is proposed to exclude the distant centers so that only the distances from the datum to the nearer cluster centers are computed in each iteration. The traditional k-means algorithm is susceptible to initial cluster centers, so a maximal distance based initial cluster center (MDICC) decider is hence presented to provide a set of initial cluster centers, therefore accelerating the r-RPKM algorithm. The experimental results demonstrated that the r-RPKM algorithm executes much faster than the traditional k-means algorithm and provides the same clustering result as the traditional k-means algorithm.

Full Text