A novel gene selection algorithm for cancer classification using microarray datasets

Russul Alanni,Yong Xiang,Jingyu Hou,Hasseeb Azzawi

doi:10.1186/s12920-018-0447-6

Abstract

BackgroundMicroarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results.MethodsAn innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP.ResultsExperimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods.ConclusionGene subset selected by GSP can achieve a higher classification accuracy with less processing time.

Highlights

Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level
Results we evaluate the performance of Gene Selection Programming (GSP) method using ten microarray cancer datasets, which were downloaded from http://www.gems-system.org
Ev.1 the best setting for gene and head To set the best values for the number of genes (g) of each chromosome and the size of the gene head (h) in the GSP method, we evaluated nine different settings to show their effect on the GSP performance results

Summary

Introduction

Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene expression data obtained by microarray technology could bring difficulties to classification methods due to the fact that usually the number of genes in a microarray dataset is very big, while the number of samples is small. Gene selection is a process of identifying a subset of informative genes from the original gene set This gene subset enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. This technique can decrease the computational costs and improve the cancer classification performance [5, 6].

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Jan 15, 2019
Citations: 68	License type: open-access

R Discovery Prime

R Discovery Prime

A novel gene selection algorithm for cancer classification using microarray datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Gene selection algorithms for microarray data based on least squares support vector machine.
E Ke Tang ... Pn Suganthan
BMC bioinformatics | VOL. 7
E Ke Tang, et. al.E Ke Tang ... Pn Suganthan
27 Feb 2006
BMC bioinformatics | VOL. 7

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features
Bharathi
American Journal of Applied Sciences | VOL. 8
Bharathi Bharathi
01 Dec 2011
American Journal of Applied Sciences | VOL. 8

Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data
K Y Yeung ... A E Raftery
Computer applications in the biosciences : CABIOS | VOL. 21
K Y Yeung, et. al.K Y Yeung ... A E Raftery
15 Feb 2005
Computer applications in the biosciences : CABIOS | VOL. 21

A Hybrid Barnacles Mating Optimizer Algorithm With Support Vector Machines for Gene Selection of Microarray Cancer Classification
Essam H Houssein ... Mustafa M Al-Sayed
IEEE access : practical innovations, open solutions | VOL. 9
Essam H Houssein, et. al.Essam H Houssein ... Mustafa M Al-Sayed
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel gene selection algorithm for cancer classification using microarray datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics