Filter vs. Wrapper approach for optimum gene selection of high dimensional gene expression dataset: An analysis with cancer datasets

Bhavna Srivastava,Rajeev Srivastava,Mahesh Jangid

doi:10.1109/ichpca.2014.7045359

Abstract

In Bioinformatics, gene dataset experiments are generating thousands of gene expression measurements, which generally used to collect information from tissue and cell samples regarding gene expression differences. Optimum gene selection from such gene expression datasets and their classification plays an important role for disease prediction & diagnosis. Further the task ahead to understand that, what is the best way of gene selection to get maximum classification accuracy from such high dimensional gene expression dataset, whether the filter is the best way to rely upon or wrapper approach can be the best suitable, beyond that which classifier works well with filter and with wrapper? To answer the question, in this paper, the performance of the filter vs. wrapper gene selection technique is being evaluated by supervised classifiers over three well known public domain datasets viz. Ovarian Cancer, Lymphomas & Leukemia. For optimal gene selection, ReliefF method is used as a filter based gene selection and Random gene subset selection algorithm is used as a wrapper based gene selection. For classification, different linear as well as an ensemble classifiers have been tested for their performances. This paper also tries to bring the fact of timing details so that through analysis, it can get derived upon that which approach is more appropriate for better time management as well as with high accuracy of the selected dataset.

Full Text