Abstract

BackgroundIn the studies of genomics, it is essential to select a small number of genes that are more significant than the others for the association studies of disease susceptibility. In this work, our goal was to compare computational tools with and without feature selection for predicting chronic fatigue syndrome (CFS) using genetic factors such as single nucleotide polymorphisms (SNPs).MethodsWe employed the dataset that was original to the previous study by the CDC Chronic Fatigue Syndrome Research Group. To uncover relationships between CFS and SNPs, we applied three classification algorithms including naive Bayes, the support vector machine algorithm, and the C4.5 decision tree algorithm. Furthermore, we utilized feature selection methods to identify a subset of influential SNPs. One was the hybrid feature selection approach combining the chi-squared and information-gain methods. The other was the wrapper-based feature selection method.ResultsThe naive Bayes model with the wrapper-based approach performed maximally among predictive models to infer the disease susceptibility dealing with the complex relationship between CFS and SNPs.ConclusionWe demonstrated that our approach is a promising method to assess the associations between CFS and SNPs.

Highlights

  • In the studies of genomics, it is essential to select a small number of genes that are more significant than the others for the association studies of disease susceptibility

  • We demonstrated that our approach is a promising method to assess the associations between chronic fatigue syndrome (CFS) and single nucleotide polymorphisms (SNPs)

  • Our results showed that the naive Bayes classifier with the wrapper-based approach was superior to the other algorithms we tested in our application, achieving the greatest area under the ROC curve (AUC) with the smallest number of SNPs in distinguishing between the CFS patients and controls

Read more

Summary

Introduction

In the studies of genomics, it is essential to select a small number of genes that are more significant than the others for the association studies of disease susceptibility. Our goal was to compare computational tools with and without feature selection for predicting chronic fatigue syndrome (CFS) using genetic factors such as single nucleotide polymorphisms (SNPs). Single nucleotide polymorphisms (SNPs) can be used in clinical association studies to determine the contribution of genes to disease susceptibility or drug efficacy [6,7]. It (page number not for citation purposes)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call