Abstract

BackgroundDetecting single nucleotide polymorphism (SNP) interactions is an important and challenging task in genome-wide association studies (GWAS). Various efforts have been devoted to detect SNP interactions. However, the large volume of SNP datasets results in such a big number of high-order SNP combinations that restrict the power of detecting interactions.MethodsIn this paper, to combat with this challenge, we propose a two-stage approach (called HiSSI) to detect high-order SNP-SNP interactions. In the screening stage, HiSSI employs a statistically significant pattern that takes into account family wise error rate, to control false positives and to effectively screen two-locus combinations candidate set. In the searching stage, HiSSI applies two different search strategies (exhaustive search and heuristic search based on differential evolution along with χ2-test) on candidate pairwise SNP combinations to detect high-order SNP interactions.ResultsExtensive experiments on simulated datasets are conducted to evaluate HiSSI and recently proposed and related approaches on both two-locus and three-locus disease models. A real genome-wide dataset: breast cancer dataset collected from the Wellcome Trust Case Control Consortium (WTCCC) is also used to test HiSSI.ConclusionsSimulated experiments on both two-locus and three-locus disease models show that HiSSI is more powerful than other related approaches. Real experiment on breast cancer dataset, in which HiSSI detects some significantly two-locus and three-locus interactions associated with breast cancer, again corroborate the effectiveness of HiSSI in high-order SNP-SNP interaction identification.

Highlights

  • Detecting single nucleotide polymorphism (SNP) interactions is an important and challenging task in genome-wide association studies (GWAS)

  • We propose a two-stage approach named High-order SNP-SNP interactions detection (HiSSI) to detect high-order SNP interactions based on candidate pairwise SNP combinations

  • HiSSI-breast cancer (BC) is a variant of HiSSI, it obtains the corrected significant threshold using the Bonferroni correction

Read more

Summary

Introduction

Detecting single nucleotide polymorphism (SNP) interactions is an important and challenging task in genome-wide association studies (GWAS). It has been widely recognized that single nucleotide polymorphisms (SNPs) are associated with a variety of human complex diseases. Genome-wide association study (GWAS) has became a powerful tool for detecting SNPs and detected hundreds of single SNPs associated with complex diseases [1]. Existing approaches for searching two-locus SNP interactions can be grouped into three categories: exhaustive search, stochastic search and machine learning based search. Methods based on exhaustive search enumerate all possible SNP combinations of two-locus. Stochastic methods use the random sampling procedures to search the space of SNP combinations. Bureau et al [6] focused on measures of predictive importance and applied random forest to discover predictive polymorphisms or markers of a phenotype, which are likely to affect disease susceptibility

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call