Screening-testing approaches for gene-gene and gene-environment interactions using independent statistics

Joshua Millstein

doi:10.3389/fgene.2013.00306

Abstract

Next-generation sequencing and other high-throughput technologies have made it feasible to characterize millions of sequence variations on large numbers of study participants. But when it comes to identifying a small number of these genetic features (or feature sets) that are associated with a disease trait, the investigator is faced with a formidable multiple-testing challenge. It can be thought of as a signal-tonoise problem, where the large number of unrelated genetic features tends to drown out the faint signal of the small number of biologically relevant features. The theoretical underpinnings of an emerging class of statistical methods for genomic studies, two-stage procedures for both gene-gene and gene-environment interactions have recently been described in a remarkable article (Dai et al., 2012). The key idea is that dimensionality of multiple testing in genomics can be reduced by screening features to be tested with an independent statistic in the same dataset, thereby mitigating the multipletesting problem and increasing power to detect effects. In other words, the noise is reduced, allowing the relevant signal to be more easily detected. These methods will likely gain importance as high-throughput technologies continue to yield exponentially increasing amounts of information per sample and per research dollar spent. Dai et al. couched their paper in the context of gene-environment interactions only. However, it is worth noting that the theoretical properties detailed by Dai et al. apply not just to the search for gene-environment interactions (GxE), but also to (epistatic) interactions between genetic variants (GxG), since in constructing these hypothesis tests, both “gene” and “environment” features are treated analogously as discrete or continuous variables in models designed to identify associations with a disease trait. A notable exception is when the approach depends on the environmental exposure being a randomized treatment, allowingadditional assumptions to be made. One such screening-testing interaction approach is designed for a case-control study where the investigator is interested in identifyingGxGorGxE pairs involved in interactions (Millstein et al., 2006;Murcray et al., 2009; Dai et al., 2012; Lewinger et al., 2013). There is an assumption that each pair of features considered is independent in the general population, and only if a dependence is found in the pooled case-control sample (the screening stage), is the pair tested in a formal model that includes an interaction term (the testing stage), e.g., logit(P[D]) = α+ β1∗SNP1+ β2 ∗SNP2+ β3∗SNP1∗SNP2, where β3 is the interaction parameter and D indicates disease. The interaction parameter can be testedaloneorinamultidegree-of-freedom testofoneorbothmaineffectstogetherwith the interaction, an approach that was generally found to bemore powerful (Millstein et al., 2006; Kraft et al., 2007). An important characteristic of the approach is that even if the independence assumption is not justified, type I error in the testing stage will still be properly controlled. This approach is perhaps more general and more powerful than previously appreciated. The screening procedure appears to be sensitive to both main effects and interactions, not just interactions, as claimed in prior work. The implication is that the approach is less specific to interactions and correspondingly more powerful when main effects are present. In fact, it may be capable of detecting weak interactions coupled with weak main effects. Some authors (Murcray et al., 2009; Dai et al., 2012; Lewinger et al., 2013) have attributed the statistical power of the screening procedure solely to an association in cases due to an interaction in the underlying population (non-zero β3, or more correctly, a departure from multiplicativity on a relative risk scale), as in the case-only interaction analysis (Piegorsch et al., 1994). According to this view, controls only contribute noise to the screening procedure because the factors are independent in this population. Further, if the two features contribute marginal disease risks and a multiplicative relative risk model describes their joint risk, then dependencies will not be induced among cases. The idea is that if there is independence in cases and independence in controls, then it should follow that there would be independence in the pooled case-control sample—but this is not necessarily the case. It has not been adequately appreciated that when cases and controls are pooled, main effects can contribute a substantial increase in power to capture disease-related feature pairs with the above screening procedure. Interestingly, the complex conditioning on disease status inherent in pooling of cases and controls can induce dependencies and thus increase power of the screening procedure when main effects are present. As proof of concept, consider the relatively simple relative risk model, log(P[D]) = λ+ β1∗SNP1+ β2∗SNP2+ β3 ∗SNP1∗SNP2, where exp (λ) is the baseline risk, the two SNPs have equal relative risks per allele, i.e., β1 = β2, there is a weak interaction (small β3), and equal

Highlights

Next-generation sequencing and other high-throughput technologies have made it feasible to characterize millions of sequence variations on large numbers of study participants
The key idea is that dimensionality of multiple testing in genomics can be reduced by screening features to be tested with an independent statistic in the same dataset, thereby mitigating the multipletesting problem and increasing power to detect effects
A notable exception is when the approach depends on the environmental exposure being a randomized treatment, allowing additional assumptions to be made. One such screening-testing interaction approach is designed for a case-control study where the investigator is interested in identifying GxG or GxE pairs involved in interactions (Millstein et al, 2006; Murcray et al, 2009; Dai et al, 2012; Lewinger et al, 2013)

Summary

Introduction

Next-generation sequencing and other high-throughput technologies have made it feasible to characterize millions of sequence variations on large numbers of study participants. One such screening-testing interaction approach is designed for a case-control study where the investigator is interested in identifying GxG or GxE pairs involved in interactions (Millstein et al, 2006; Murcray et al, 2009; Dai et al, 2012; Lewinger et al, 2013).

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Jan 1, 2013
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Screening-testing approaches for gene-gene and gene-environment interactions using independent statistics

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Detecting disease-associated genotype patterns
Quan Long ... Qingrun Zhang
BMC Bioinformatics | VOL. 10
Quan Long, et. al.Quan Long ... Qingrun Zhang
01 Jan 2009
BMC Bioinformatics | VOL. 10

A multifactor dimensionality reduction model of gene polymorphisms and an environmental interaction analysis in type 2 diabetes mellitus study among Punjabi, a North India population
Basanti Barna ... A.J.S Bhanwer
Meta Gene | VOL. 16
Basanti Barna, et. al.Basanti Barna ... A.J.S Bhanwer
31 Jan 2018
Meta Gene | VOL. 16

Simulating gene-gene and gene-environment interactions in complex diseases: Gene-Environment iNteraction Simulator 2.
Michele Pinelli ... Gennaro Miele
BMC bioinformatics | VOL. 13
Michele Pinelli, et. al.Michele Pinelli ... Gennaro Miele
14 Jun 2012
BMC bioinformatics | VOL. 13

The Need for a Systematic Approach to Complex Pathways in Molecular Epidemiology
Duncan C Thomas
Cancer Epidemiology, Biomarkers & Prevention | VOL. 14
Duncan C ThomasDuncan C Thomas
01 Mar 2005
Cancer Epidemiology, Biomarkers & Prevention | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Screening-testing approaches for gene-gene and gene-environment interactions using independent statistics

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Genetics