Improved initial cluster center selection in K-means clustering

Minchen Zhu,Weizhi Wang,Jingshan Huang,Teen-Hang Meen,Steven D Prior,Artde Donald Kin-Tak Lam

doi:10.1108/ec-11-2012-0288

Abstract

Purpose – It is well known that the selection of initial cluster centers can significantly affect K-means clustering results. The purpose of this paper is to propose an improved, efficient methodology to handle such a challenge. Design/methodology/approach – According to the fact that the inner-class distance among samples within the same cluster is supposed to be smaller than the inter-class distance among clusters, the algorithm will dynamically adjust initial cluster centers that are randomly selected. Consequently, such adjusted initial cluster centers will be highly representative in the sense that they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can then be effectively reduced. In addition, the algorithm is able to obtain all initial cluster centers simultaneously (instead of one center at a time) during the dynamic adjustment. Findings – Experimental results demonstrate that the proposed algorithm greatly improves the accuracy of traditional K-means clustering results and, in a more efficient manner. Originality/value – The authors presented in this paper an efficient algorithm, which is able to dynamically adjust initial cluster centers that are randomly selected. The adjusted centers are highly representative, i.e. they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can be effectively reduced so that the authors can achieve an improved clustering accuracy. In addition, the algorithm is a cost-efficient one and the enhanced clustering accuracy can be obtained in a more efficient manner compared with traditional K-means algorithm.

Full Text