A Hybrid Feature Selection Method for Complex Diseases SNPs

Raid Alzubi,Naeem Ramzan,Abbes Amira,Hadeel Alzoubi

doi:10.1109/access.2017.2778268

Raid Alzubi, Naeem Ramzan + Show 2 more

Open Access

https://doi.org/10.1109/access.2017.2778268

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2018
Citations: 77	License type: CC BY 3.0

Affiliation: University of the West of Scotland, Qatar University

Abstract

Machine learning techniques have the potential to revolutionize medical diagnosis. Single Nucleotide Polymorphisms (SNPs) are one of the most important sources of human genome variability; thus, they have been implicated in several human diseases. To separate the affected samples from the normal ones, various techniques have been applied on SNPs. Achieving high classification accuracy in such a high-dimensional space is crucial for successful diagnosis and treatment. In this work, we propose an accurate hybrid feature selection method for detecting the most informative SNPs and selecting an optimal SNP subset. The proposed method is based on the fusion of a filter and a wrapper method, i.e., the Conditional Mutual Information Maximization (CMIM) method and the support vector machine-recursive feature elimination, respectively. The performance of the proposed method was evaluated against four state-of-the-art feature selection methods, minimum redundancy maximum relevancy, fast correlation-based feature selection, CMIM, and ReliefF, using four classifiers, support vector machine, naive Bayes, linear discriminant analysis, and $k$ nearest neighbors on five different SNP data sets obtained from the National Center for Biotechnology Information gene expression omnibus genomics data repository. The experimental results demonstrate the efficiency of the adopted feature selection approach outperforming all of the compared feature selection algorithms and achieving up to 96% classification accuracy for the used data set. In general, from these results we conclude that SNPs of the whole genome can be efficiently employed to distinguish affected individuals with complex diseases from the healthy ones.

Highlights

The human genome is the whole set of Deoxyribonucleic acid (DNA) sequence for humans
The performance of the proposed method was evaluated against four state-of-the-art feature selection methods, minimum redundancy maximum relevancy, fast correlationbased feature selection, Conditional Mutual Information Maximization (CMIM), and ReliefF, using four classifiers, support vector machine, naive Bayes, linear discriminant analysis, and k nearest neighbors on five different Single Nucleotide Polymorphisms (SNPs) data sets obtained from the National Center for Biotechnology Information gene expression omnibus genomics data repository
In this work we proposed a hybrid feature selection model to select the optimal subset of SNPs

Summary

Introduction

The human genome is the whole set of Deoxyribonucleic acid (DNA) sequence for humans It consists of approximately three billion base pairs, with more than 99% of nucleotides being exactly matched among the whole population, and less than 1% difference among persons. The majority of these genetic variations occur as Single Nucleotide Polymorphisms (SNPs). The main advantage that makes SNPs preferable over microarray gene expressions are stability, high frequency and being easier and faster to collect [1]. In this context, many machine learning algorithms have been widely applied for SNP data classification. The ‘‘curse of dimensionality’’ is the main challenge encountered, in most studies, due to the number of samples (a few hundreds) being significantly smaller than the number of SNPs (up to one million) [1]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Method for Complex Diseases SNPs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Hybrid feature selection method for autism spectrum disorder SNPs
Raid Alzubi ... Naeem Ramzan
-
Raid Alzubi, et. al.Raid Alzubi ... Naeem Ramzan
01 Aug 2017
01 Aug 2017

The feature selection bias problem in relation to high-dimensional gene data.
Jerzy Krawczuk ... Tomasz Łukaszuk
Artificial Intelligence in Medicine | VOL. 66
Jerzy Krawczuk, et. al.Jerzy Krawczuk ... Tomasz Łukaszuk
14 Nov 2015
Artificial Intelligence in Medicine | VOL. 66

Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Shahnorbanun Sahran ... Suria Hayati Md Pauzi
Artificial Intelligence in Medicine | VOL. 87
Shahnorbanun Sahran, et. al.Shahnorbanun Sahran ... Suria Hayati Md Pauzi
19 Apr 2018
Artificial Intelligence in Medicine | VOL. 87

Improving the performance of SVM-RFE on classification of pancreatic cancer data
Jiapeng Yin ... Han Yu
-
Jiapeng Yin, et. al.Jiapeng Yin ... Han Yu
01 Mar 2016
01 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Method for Complex Diseases SNPs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access