Abstract
Gene microarray classification problems are considered a challenge task since the datasets contain few number of samples with high number of genes (features). The genes subset selection in microarray data play an important role for minimizing the computational load and solving classification problems. In this paper, the Correlation-based Feature Selection (CFS) algorithm is utilized in the feature selection process to reduce the dimensionality of data and finding a set of discriminatory genes. Then, the Decision Table, JRip, and OneR are employed for classification process. The proposed approach of gene selection and classification is tested on 11 microarray datasets and the performances of the filtered datasets are compared with the original datasets. The experimental results showed that CFS can effectively screen irrelevant, redundant, and noisy features. In addition, the results for all datasets proved that the proposed approach with a small number of genes can achieve high prediction accuracy and fast computational speed. Considering the average accuracy for all the analysis of microarray data, the JRip achieved the best result as compared to Decision Table, and OneR classifier. The proposed approach has a remarkable impact on the classification accuracy especially when the data is complicated with multiple classes and high number of genes.
Highlights
Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]
The Decision Table, JRip, and OneR classifiers were applied on the original datasets
The results show that the number of selected genes for Breast Cancer is reduced from 24481 to 138, Central Nervous System (CNS) from 7129 to 39, Colon Tumor from 2000 to 26, Leukemia from 7129 to 79, Leukemia_3C from 7129 to 104, Leukemia_4C from 7129 to 119, Lung Cancer from 12600 to 548, Lymphoma from 4026 to 175, Mixed-Lineage Leukemia (MLL) from 12582 to 142, Ovarian Cancer from 15154 to 35, and Small Round Blue-Cell Tumor (SRBCT) from 2308 to 112 genes
Summary
Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]. Different meta-heuristic algorithms have been adapted for feature selection issues [19][29]. Examples of these algorithms are Principle Component Analysis [34], Genetic Algorithm [3], Ant Colony Optimization [9], Simulated Annealing [16] and Particle Swarm Optimization [5][33]. Correlation-based Feature Selection (CFS) is a simple filter algorithm that ranks feature subsets according to a correlation-based heuristic evaluation function [38]. CFS evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them [19]. Greedy Stepwise is used as search method with CFS algorithm
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Online and Biomedical Engineering (iJOE)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.