Abstract

Gene expression profiling reveals the activity of thousands of genes that can help to identify cancer biomarkers. However, the presence of such a large number of genes in the profiles inflicts a high computational burden on classifiers. To deal with the high-dimensional feature space, in this paper, we introduce a 3-phase feature selection framework, ANOVA-SRC-BPSO. ANOVA-SRC-BPSO first distinguishes the highly class-correlated genes utilizing the analysis of variance (ANOVA) and F-test. In the second phase, we employ Spearman rank-order correlation (SRC) to eliminate redundant genes. Finally, we leverage the binary particle swarm optimization (BPSO) with the support vector machine (SVM) classifier to select an optimized feature subset. We report the accuracy of ANOVA-SRC-BPSO utilizing the SVM classifier in seven gene expression datasets. The comparisons with fourteen state-of-the-art methods show that ANOVA-SRC-BPSO yields the highest accuracy in five datasets. Moreover, we disclose that the performances of various feature selection approaches are inconsistent across gene expression datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call