Comparison of hybrid feature selection models on gene expression data

Patharawut Saengsiri,Sageemas Na Wichian,Unger Herwig,Phayung Meesad

doi:10.1109/ictke.2010.5692905

Abstract

Microarray data contains thousands of genes which are used to evaluate expression level. However, most of them are not associated with cancer diseases and leads to the curse of dimensionality. The challenge based on microarray data is feature selection which searches for subsets of informative genes. At the moment, these techniques focus on filter and wrapper approaches to discover subsets of genes. Filter approach is better than wrapper approach in terms of time consuming. On the contrary, the accuracy of wrapper approach is higher than that of filter approach. However, it is more beneficial to reduce the time process and increase accuracy simultaneously when searching for subsets of genes. Thus, this paper proposes comparison of hybrid feature selection models on gene expression datasets, this consists of four steps 1) filter subgroup of gene using Correlation based Feature Selection (CFS), Gain Ratio (GR), and Information Gain (INFO) 2) transfers output of each filter method into a wrapper approach that's based on the Support Vector Machine (SVM) classifier and two heuristic searches which are Greedy Search (GS) and Genetic Algorithm (GA) 3) generate hybrid feature selection model CFSSVMGA, CSFSVMGS, GRSVMGA, GRSVMGS, INFOSVMGA, and INFOSVMGS 4) performance comparison using precision, recall, F-measure, and accuracy rate. Results from the experiment concluded the CFSSVMGA model outperformed other models on three public gene expression datasets.

Full Text