Entropy Based Clustering to Determine Discriminatory Genes for Microarray Dataset

Rajni Bala,R K Agrawal

doi:10.1007/978-3-642-14834-7_38

Abstract

Microarray datasets suffers from curse of dimensionality as they are represented by high dimension and only few samples are available. For efficient classification of samples there is a need of selecting a smaller set of relevant and non-redundant genes. In this paper, we propose a two stage algorithm GSUCE for finding a set of discriminatory genes responsible for classification in high dimensional microarray datasets. In the first stage the correlated genes are grouped into clusters and the best gene is selected from each cluster to create a pool of independent genes. This will reduce redundancy. We have used maximal information compression to measure similarity between genes. In second stage a wrapper based forward feature selection method is used to obtain a set of informative genes for a given classifier. The proposed algorithm is tested on five well known publicly available datasets . Comparison with other state of art methods shows that our proposed algorithm is able to achieve better classification accuracy with less number of features.

Full Text