Abstract

Gene microarray classification problems are considered a challenge task since the datasets contain few number of samples with high number of genes (features). The genes subset selection in microarray data play an important role for minimizing the computational load and solving classification problems. In this paper, the Correlation-based Feature Selection (CFS) algorithm is utilized in the feature selection process to reduce the dimensionality of data and finding a set of discriminatory genes. Then, the Decision Table, JRip, and OneR are employed for classification process. The proposed approach of gene selection and classification is tested on 11 microarray datasets and the performances of the filtered datasets are compared with the original datasets. The experimental results showed that CFS can effectively screen irrelevant, redundant, and noisy features. In addition, the results for all datasets proved that the proposed approach with a small number of genes can achieve high prediction accuracy and fast computational speed. Considering the average accuracy for all the analysis of microarray data, the JRip achieved the best result as compared to Decision Table, and OneR classifier. The proposed approach has a remarkable impact on the classification accuracy especially when the data is complicated with multiple classes and high number of genes.

Highlights

  • Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]

  • The Decision Table, JRip, and OneR classifiers were applied on the original datasets

  • The results show that the number of selected genes for Breast Cancer is reduced from 24481 to 138, Central Nervous System (CNS) from 7129 to 39, Colon Tumor from 2000 to 26, Leukemia from 7129 to 79, Leukemia_3C from 7129 to 104, Leukemia_4C from 7129 to 119, Lung Cancer from 12600 to 548, Lymphoma from 4026 to 175, Mixed-Lineage Leukemia (MLL) from 12582 to 142, Ovarian Cancer from 15154 to 35, and Small Round Blue-Cell Tumor (SRBCT) from 2308 to 112 genes

Read more

Summary

Introduction

Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]. Different meta-heuristic algorithms have been adapted for feature selection issues [19][29]. Examples of these algorithms are Principle Component Analysis [34], Genetic Algorithm [3], Ant Colony Optimization [9], Simulated Annealing [16] and Particle Swarm Optimization [5][33]. Correlation-based Feature Selection (CFS) is a simple filter algorithm that ranks feature subsets according to a correlation-based heuristic evaluation function [38]. CFS evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them [19]. Greedy Stepwise is used as search method with CFS algorithm

Background
Datasets
Correlation based feature selection algorithm
Classification model
Experimental Design and Results Discussion
Conclusion
Authors

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.