Abstract

BackgroundGenome-wide association studies provide important insights to the genetic component of disease risks. However, an existing challenge is how to incorporate collective effects of interactions beyond the level of independent single nucleotide polymorphism (SNP) tests. While methods considering each SNP pair separately have provided insights, a large portion of expected heritability may reside in higher-order interaction effects.ResultsWe describe an inference approach (discrete discriminant analysis; DDA) designed to probe collective interactions while treating both genotypes and phenotypes as random variables. The genotype distributions in case and control groups are modeled separately based on empirical allele frequency and covariance data, whose differences yield disease risk parameters. We compared pairwise tests and collective inference methods, the latter based both on DDA and logistic regression. Analyses using simulated data demonstrated that significantly higher sensitivity and specificity can be achieved with collective inference in comparison to pairwise tests, and with DDA in comparison to logistic regression. Using age-related macular degeneration (AMD) data, we demonstrated two possible applications of DDA. In the first application, a genome-wide SNP set is reduced into a small number (∼100) of variants via filtering and SNP pairs with significant interactions are identified. We found that interactions between SNPs with highest AMD association were epigenetically active in the liver, adipocytes, and mesenchymal stem cells. In the other application, multiple groups of SNPs were formed from the genome-wide data and their relative strengths of association were compared using cross-validation. This analysis allowed us to discover novel collections of loci for which interactions between SNPs play significant roles in their disease association. In particular, we considered pathway-based groups of SNPs containing up to ∼10, 000 variants in each group. In addition to pathways related to complement activation, our collective inference pointed to pathway groups involved in phospholipid synthesis, oxidative stress, and apoptosis, consistent with the AMD pathogenesis mechanism where the dysfunction of retinal pigment epithelium cells plays central roles.ConclusionsThe simultaneous inference of collective interaction effects within a set of SNPs has the potential to reveal novel aspects of disease association.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2871-3) contains supplementary material, which is available to authorized users.

Highlights

  • Genome-wide association studies provide important insights to the genetic component of disease risks

  • To deal more directly with genome-wide data in an unbiased fashion, we describe a second mode of discrete discriminant analysis (DDA) application where ∼ 106 single nucleotide polymorphism (SNP) are grouped into (∼ 103 or more) subsets based on phenotype-independent criteria, the collective inference is applied to each subset, and their relative importance in disease association is evaluated based on cross-validation prediction score

  • Independent SNPs When interactions between the loci are turned off, DDA can be solved analytically, whereas logistic regression is always numerical. We first compared this special case of DDA and logistic regression without interaction and found the odds ratio and power to be identical for all conditions for binary models (Additional file 2: Figure S1), which implies that the effect of marginal genotype distributions ignored in logistic regression is negligible for a single non-interacting locus

Read more

Summary

Introduction

Genome-wide association studies provide important insights to the genetic component of disease risks. It has recently been shown that meta-analyses involving increasingly large sample sizes can yield many additional loci of statistical significance [10, 11] Another potential source of such ‘missing heritability’ is the contribution of rare variants not detected by population-based genotyping platforms. Recent studies based on exome and whole-genome sequencing data combined with statistical tests including burden tests [12], C-alpha test [13], and sequence kernel association test [14] are beginning to address such possibilities It is expected, that the limitation of independent single nucleotide polymorphism (SNP) analyses, where each locus is considered separately to evaluate its association with disease using trend tests or logistic regression models [15], and possible effects of epistasis contribute to the limited degree of biological effects uncovered so far

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call