Abstract

ABSTRACTMendelian randomization uses genetic variants to make causal inferences about the effect of a risk factor on an outcome. With fine‐mapped genetic data, there may be hundreds of genetic variants in a single gene region any of which could be used to assess this causal relationship. However, using too many genetic variants in the analysis can lead to spurious estimates and inflated Type 1 error rates. But if only a few genetic variants are used, then the majority of the data is ignored and estimates are highly sensitive to the particular choice of variants. We propose an approach based on summarized data only (genetic association and correlation estimates) that uses principal components analysis to form instruments. This approach has desirable theoretical properties: it takes the totality of data into account and does not suffer from numerical instabilities. It also has good properties in simulation studies: it is not particularly sensitive to varying the genetic variants included in the analysis or the genetic correlation matrix, and it does not have greatly inflated Type 1 error rates. Overall, the method gives estimates that are less precise than those from variable selection approaches (such as using a conditional analysis or pruning approach to select variants), but are more robust to seemingly arbitrary choices in the variable selection step. Methods are illustrated by an example using genetic associations with testosterone for 320 genetic variants to assess the effect of sex hormone related pathways on coronary artery disease risk, in which variable selection approaches give inconsistent inferences.

Highlights

  • In a Mendelian randomization investigation, genetic variants that are instrumental variables for a given risk factor are used to assess the causal effect of the risk factor on an outcome (Burgess & Thompson, 2015; Davey Smith &Ebrahim, 2003)

  • As in the previous simulations, estimates from the pruning approach became more precise as the threshold correlation increased, Type 1 error rates were above nominal levels for ρ = 0.8 even when the association estimates were not rounded

  • We first connected previously known results together to show from theoretical arguments that genetic variants included in a Mendelian randomization analysis should be those that are associated with the risk factor in a conditional analysis

Read more

Summary

BACKGROUND

In a Mendelian randomization investigation, genetic variants that are instrumental variables for a given risk factor are used to assess the causal effect of the risk factor on an outcome An association between such a genetic variant and the outcome is indicative of a causal effect of the risk factor on the outcome (Didelez & Sheehan, 2007; Lawlor, Harbord, Sterne, Timpson, & Davey Smith, 2008). When there are multiple uncorrelated genetic variants that are instrumental variables for the same risk factor, power to detect. When genetic variants are correlated, it is not clear how to choose which variants to include in the analysis to obtain the most efficient estimate possible without the analysis suffering from numerical instabilities when there are large numbers of highly correlated candidate variants (such as with fine-mapped genetic data)

Theoretical viewpoint
Estimating a causal effect using summarized data
Scope of paper
MOTIVATING EXAMPLE
CHOOSING THE RIGHT NUMBER OF VARIANTS
Too many variants
Too few variants
Sensitivity to choice of genetic variants
Sensitivity to correlation matrix
Rounding of association estimates
Results
DISCUSSION
Comparison with previous work
Software code
Proof of equality of 2SLS and IVW estimates
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.