Application of an improved K-means algorithm in gene expression data analysis

Qian Ren Qian Ren,Xinjian Zhuo Xinjian Zhuo

doi:10.1109/isb.2011.6033126

Abstract

K-means algorithm is one of the most classic partition algorithms in clustering algorithms. The result obtained by K-means algorithm varies with the choice of the initial clustering centers. Motivated by this, an improved K-means algorithm is proposed based on the Kruskal algorithm, which is famous in graph theory. The procedure of this algorithm is shown as follows: Firstly, the minimum spanning tree (MST) of the clustered objects is obtained by using Kruskal algorithm. Then K-1 edges are deleted based on weights in a descending order. At last, the average values of the objects contained by the k-connected graphs resulting from last two steps are regarded as the initial clustering centers to cluster. Make the improved K-means algorithm used in gene expression data analysis, simulation experiment shows that the improved K-means algorithm has a better clustering effect and higher efficiency than the traditional one.

Full Text