Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.
Read full abstract