Clustering Microarray Data by Using a Stochastic Algorithm

Ho Sun Shon Ho Sun Shon,Keun Ho Ryu Keun Ho Ryu,Seung Jung Shin Seung Jung Shin,Sunshin Kim Sunshin Kim

doi:10.1109/cit.2008.workshops.117

Abstract

The clustering of gene expression data is used to analyze the results of microarray studies. This method is often useful in understanding how a particular class of genes functions together during a biological process. In this study, we attempted to perform clustering using the Markov cluster (MCL) algorithm, a clustering method for graphs based on the simulation of stochastic flow. It is a fast and efficient algorithm that clusters nodes in a graph through simulation by computing probability. First, we converted the raw matrix into a sample matrix using the Euclidean distance of the genes between the samples. Second, we applied the MCL algorithm to the new matrix of Euclidean distance and considered 2 factors, namely, the inflation and diagonal terms of the matrix. We have turned to set the proper factors through massive experiments. In addition, distance thresholds, i.e., the average of each column data elements, were used to clearly distinguish between groups. Our experimental result shows about 70% accuracy in average compared to the class that is known before. We also compared the MCL algorithm with the self-organizing map (SOM) clustering, K-means clustering and hierarchical clustering (HC) algorithms.

Full Text