Gene Microarray Cancer Classification using Correlation Based Feature Selection Algorithm and Rules Classifiers

Mohammad Subhi Al-Batah,Saleh Ali Alomari,Mowafaq Salem Alzboon,Belal Mohammad Zaqaibeh

doi:10.3991/ijoe.v15i08.10617

Abstract

Gene microarray classification problems are considered a challenge task since the datasets contain few number of samples with high number of genes (features). The genes subset selection in microarray data play an important role for minimizing the computational load and solving classification problems. In this paper, the Correlation-based Feature Selection (CFS) algorithm is utilized in the feature selection process to reduce the dimensionality of data and finding a set of discriminatory genes. Then, the Decision Table, JRip, and OneR are employed for classification process. The proposed approach of gene selection and classification is tested on 11 microarray datasets and the performances of the filtered datasets are compared with the original datasets. The experimental results showed that CFS can effectively screen irrelevant, redundant, and noisy features. In addition, the results for all datasets proved that the proposed approach with a small number of genes can achieve high prediction accuracy and fast computational speed. Considering the average accuracy for all the analysis of microarray data, the JRip achieved the best result as compared to Decision Table, and OneR classifier. The proposed approach has a remarkable impact on the classification accuracy especially when the data is complicated with multiple classes and high number of genes.

Highlights

Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]
The Decision Table, JRip, and OneR classifiers were applied on the original datasets
The results show that the number of selected genes for Breast Cancer is reduced from 24481 to 138, Central Nervous System (CNS) from 7129 to 39, Colon Tumor from 2000 to 26, Leukemia from 7129 to 79, Leukemia_3C from 7129 to 104, Leukemia_4C from 7129 to 119, Lung Cancer from 12600 to 548, Lymphoma from 4026 to 175, Mixed-Lineage Leukemia (MLL) from 12582 to 142, Ovarian Cancer from 15154 to 35, and Small Round Blue-Cell Tumor (SRBCT) from 2308 to 112 genes

Summary

Introduction

Cancer is considered as one of the dreadful diseases and diagnosis of cancer is very important in initial stage for its proper treatment [11]. Different meta-heuristic algorithms have been adapted for feature selection issues [19][29]. Examples of these algorithms are Principle Component Analysis [34], Genetic Algorithm [3], Ant Colony Optimization [9], Simulated Annealing [16] and Particle Swarm Optimization [5][33]. Correlation-based Feature Selection (CFS) is a simple filter algorithm that ranks feature subsets according to a correlation-based heuristic evaluation function [38]. CFS evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them [19]. Greedy Stepwise is used as search method with CFS algorithm

Background

Datasets

Correlation based feature selection algorithm

Classification model

Experimental Design and Results Discussion

Conclusion

Authors

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Online and Biomedical Engineering (iJOE)	Publication Date: May 14, 2019
Citations: 24	License type: cc-by

R Discovery Prime

R Discovery Prime

Gene Microarray Cancer Classification using Correlation Based Feature Selection Algorithm and Rules Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Online and Biomedical Engineering (iJOE)

Lead the way for us

Similar Papers

Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality
A Meena Kowshalya ... R Madhumathi
Wireless Personal Communications | VOL. 108
A Meena Kowshalya, et. al.A Meena Kowshalya ... R Madhumathi
08 May 2019
Wireless Personal Communications | VOL. 108

A Novel Algorithm for Hub Protein Identification in H.Sapiens Using Global Amino Acid Features
B L Aswathi ... Baharak Goli
-
B L Aswathi, et. al.B L Aswathi ... Baharak Goli
01 Jan 2013
01 Jan 2013

Correlation Based Feature Selection Algorithm for Machine Learning
N Gopika ... A Meena Kowshalaya M.E
-
N Gopika, et. al.N Gopika ... A Meena Kowshalaya M.E
01 Oct 2018
01 Oct 2018

Correlation-Based Feature Selection for Intrusion Detection Design
Te-Shun Chou ... Kia Makki
-
Te-Shun Chou, et. al.Te-Shun Chou ... Kia Makki
01 Oct 2007
01 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene Microarray Cancer Classification using Correlation Based Feature Selection Algorithm and Rules Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Online and Biomedical Engineering (iJOE)