Abstract

In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call