Microarray gene expression profiles provide valuable answers to a variety of problems, and contributes to advances in clinical medicine. Gene expression data typically has a high dimension and a small sample size. Gene selection from microarray gene expression data is a challenge due to high dimensionality of the data. The number of samples in the microarray dataset is much smaller compared to the number of genes as features. To extract useful gene information from cancer microarray data and reduce dimensionality, selection of significant genes is necessary. An effective method of gene feature selection helps in dimensionality reduction and improves the classification performance. Experimental results suggest that appropriate combination of filter gene selection methods is more effective than individual techniques for microarray data classification. In this paper, we propose a two-layered feature selection method. In the first layer, t-test statistical method is used to remove the features that have little correlation with the classification results. In the second layer, line segment approximation method is used to transform the feature subset into a less dimensional feature space. Four well known classifiers kNN, SVM, NBC, DT were used to verify the performance of the proposed feature selection algorithm on binary class microarray data. The experimental results show that the proposed method can effectively select relevant gene subsets, and achieves higher classification accuracy.
Read full abstract