Abstract

BackgroundSeveral classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results.ResultsIn this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers.ConclusionsWe presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.

Highlights

  • Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data

  • Microarray technology allows scientists to monitor the expression of genes on a genomic scale. It increases the possibility of cancer classification and diagnosis at the gene expression level. Several classification methods such as radial basic function (RBF) Neural Nets, Multi-layer perceptron (MLP) Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies for the identification of differentially expressed genes in microarray data

  • We compared the efficiency of the classification methods; Support Vector Machines (SVM), RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods

Read more

Summary

Introduction

Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. It increases the possibility of cancer classification and diagnosis at the gene expression level Several classification methods such as RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies for the identification of differentially expressed genes in microarray data. Further we compared the efficiency of the feature selection methods; support vector machine recursive feature elimination (SVM-RFE) [1][2], Chi Squared [3], and CSF [4][5] In each case these methods were applied to eight different binary (two class) microarray datasets. Their efficiencies are investigated by comparing error rate of classification algorithms applied to only these selected features versus all features

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.