Abstract

BackgroundIt has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of "missing heritability" currently observed in genome-wide association studies (GWAS). However, even the simplest bivariate analysis is still held back by significant statistical and computational challenges that are often addressed by reducing the set of analysed markers. Theoretically, it has been shown that combinations of loci may exist that show weak or no effects individually, but show significant (even complete) explanatory power over phenotype when combined. Reducing the set of analysed SNPs before bivariate analysis could easily omit such critical loci.ResultsWe have developed an exhaustive bivariate GWAS analysis methodology that yields a manageable subset of candidate marker pairs for subsequent analysis using other, often more computationally expensive techniques. Our model-free filtering approach is based on classification using ROC curve analysis, an alternative to much slower regression-based modelling techniques. Exhaustive analysis of studies containing approximately 450,000 SNPs and 5,000 samples requires only 2 hours using a desktop CPU or 13 minutes using a GPU (Graphics Processing Unit). We validate our methodology with analysis of simulated datasets as well as the seven Wellcome Trust Case-Control Consortium datasets that represent a wide range of real life GWAS challenges. We have identified SNP pairs that have considerably stronger association with disease than their individual component SNPs that often show negligible effect univariately. When compared against previously reported results in the literature, our methods re-detect most significant SNP-pairs and additionally detect many pairs absent from the literature that show strong association with disease. The high overlap suggests that our fast analysis could substitute for some slower alternatives.ConclusionsWe demonstrate that the proposed methodology is robust, fast and capable of exhaustive search for epistatic interactions using a standard desktop computer. First, our implementation is significantly faster than timings for comparable algorithms reported in the literature, especially as our method allows simultaneous use of multiple statistical filters with low computing time overhead. Second, for some diseases, we have identified hundreds of SNP pairs that pass formal multiple test (Bonferroni) correction and could form a rich source of hypotheses for follow-up analysis.AvailabilityA web-based version of the software used for this analysis is available at http://bioinformatics.research.nicta.com.au/gwis.

Highlights

  • It has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of “missing heritability” currently observed in genome-wide association studies (GWAS)

  • Availability: A web-based version of the software used for this analysis is available at http://bioinformatics.research. nicta.com.au/gwis

  • We present a platform called Genome Wide Interaction Search (GWIS), that is based on classification, and novel rigorous statistical tests based on receiver operating characteristic (ROC) curve analysis [13]

Read more

Summary

Introduction

It has been hypothesized that multivariate analysis and systematic detection of epistatic interactions between explanatory genotyping variables may help resolve the problem of “missing heritability” currently observed in genome-wide association studies (GWAS). Genome-wide association studies (GWAS) have discovered many underlying genetic causes of disease, but have raised important questions about standard approaches to modelling complex traits [1]. While commonly-used univariate analysis techniques have been able to detect a number of significantly associated loci, for many conditions these discovered variants do not account for a majority of the theoretical estimates of genetic heritability. It has been shown that 2-way and 3-way single nucleotide polymorphism (SNP) interactions can explain up to ~ 50% and ~ 100% of trait variance while each SNP involved explains none [3], indicating that critical SNP pairs may be ignored by univariate analysis predominantly applied to GWAS so far. It is hypothesised that systematic detection methods may assist discovery of such potentially epistatic interactions between DNA loci

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.