A New Initialization Method for K-means Algorithm Based on Clustering Coefficient

Zhonghua Jiang

doi:10.1088/1742-6596/1992/4/042060

Abstract

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm uses a random initialization method that does not guarantee unique clustering results. In this paper, a novel method for selecting initial cluster centers on the basis of a complex network is proposed. The data set to be clustered is represented as a complex network. The clustering coefficients of a vertex and those of its adjacent vertices are used to introduce the novel concepts of “valley point” and “peak point,” which can be used to construct a set of candidates of the initial cluster centers. Finally, an algorithm for selecting the initial cluster centers from the constructed set of candidates is proposed. The time complexity of the proposed algorithm is O(n 2), where n is the number of data points. The proposed algorithm is applied to four data sets with different dimensions to compute the initial cluster centers for the k-means algorithm. Compared with the random initialization and maximin methods, the proposed algorithm demonstrates superior clustering performance in obtaining the initial cluster centers for the k-means algorithm.

Full Text