Optimized Feature Selection and Classification in Microarray Gene Expression Cancer Data

B Lakshmanan1, T Jenitha B Lakshmanan1, T Jenitha

doi:10.37506/ijphrd.v11i1.459

Abstract

Cancer classification can be performed by Microarray Gene Expression data which comprises of thousands of genes and small number of samples. Gene expression data is efficient method for finding which gene causes cancer in human being. In this work, formulate hybrid model containing filter approach, the wrapper approach and partial least square method that used to select the optimized features form the high dimensional dataset. Filter approach uses mutual information, wrapper approach uses genetic algorithm and partial least square method uses t-score estimation for feature selection mechanism. With the reduced dimension of features, classification is performed on the reduced data set to classify the samples into normal or abnormal. To attain the improved classification accuracy both the feature selection and the dimension reduction is performed. By using feature selection technique most possibly cancer related genes from huge microarray gene expression data are selected. The trained classifier model is tested with benchmark cancer dataset which consists of colon cancer dataset comprises 62 samples, 40 of which are tumor and 22 are normal with 2000 genes and the prostate cancer dataset comprises 136 samples, 59 of which is normal and 75 are tumor with 12,600 genes. The proposed model achieves accuracy of 92.7% for wrapper approach with optimal features and also outperforms other two approaches with respect to accuracy and time complexity

Full Text