Abstract

Mendelian randomization is the use of genetic instrumental variables to obtain causal inferences from observational data. Two recent developments for combining information on multiple uncorrelated instrumental variables (IVs) into a single causal estimate are as follows: (i) allele scores, in which individual‐level data on the IVs are aggregated into a univariate score, which is used as a single IV, and (ii) a summary statistic method, in which causal estimates calculated from each IV using summarized data are combined in an inverse‐variance weighted meta‐analysis. To avoid bias from weak instruments, unweighted and externally weighted allele scores have been recommended. Here, we propose equivalent approaches using summarized data and also provide extensions of the methods for use with correlated IVs. We investigate the impact of different choices of weights on the bias and precision of estimates in simulation studies. We show that allele score estimates can be reproduced using summarized data on genetic associations with the risk factor and the outcome. Estimates from the summary statistic method using external weights are biased towards the null when the weights are imprecisely estimated; in contrast, allele score estimates are unbiased. With equal or external weights, both methods provide appropriate tests of the null hypothesis of no causal effect even with large numbers of potentially weak instruments. We illustrate these methods using summarized data on the causal effect of low‐density lipoprotein cholesterol on coronary heart disease risk. It is shown that a more precise causal estimate can be obtained using multiple genetic variants from a single gene region, even if the variants are correlated. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Highlights

  • An instrumental variable (IV) can be used to estimate the causal effect of a risk factor on an outcome from observational data [1, 2]

  • Median estimates over 10 000 simulations of βX = 0.2, standard deviation (SD) of estimates, median standard error (SE) of estimates, coverage (%) of nominal 95% confidence interval for the causal parameter and empirical power (%) based on nominal 95% confidence interval to detect a causal effect from simulation study with 15 uncorrelated instrumental variables (IVs) varying the direction of confounding and average strength of IV (α) using three summarized data methods: allele score, summary statistic and likelihood-based methods, with weights taken from an external source corresponding to an independent sample of size 5000, 50 000 and using the true weights. aThe ‘weights’ for the summary statistic and likelihood-based methods are used as the βXk association estimates in equations (4) and (6)

  • Median estimates over 10 000 simulations of βX = 0.2 or βX = 0 [power (%) based on nominal 95% confidence interval] from simulation study with 15 correlated IVs varying direction of confounding and average strength of IV (α) using allele score method calculated from individual-level data and allele score, weighted generalized linear regression and likelihood-based methods all calculated from summarized data, with external (N = 5000) weights

Read more

Summary

Introduction

An instrumental variable (IV) can be used to estimate the causal effect of a risk factor on an outcome from observational data [1, 2]. IVs increases, overfitting in the first-stage regression model leads to systematic finite-sample bias in the causal estimate [9] This bias, known as weak instrument bias, acts in the direction of the confounded observational association between the risk factor and outcome [10]. An alternative approach to combine information on multiple IVs is to use summarized data on the associations of genetic variants with risk factors and disease outcomes. These data are increasingly becoming available from large consortia, such as the Global Lipids Genetics Consortium (GLGC) for lipid fractions [13] and DIAGRAM for type 2 diabetes [14]. If limited individual-level data are available (for example, on the IV–risk factor relationship but not the IV–outcome relationship), summarized associations can be obtained from the individual-level data, and the analysis can proceed using summarized data only

Modelling assumptions
Method
Individual-level data allele score method
Summarized data allele score method
Likelihood-based method
Simulation study
Results
Summary statistic methoda
Practical implications
Correlated instrumental variables
Extension to allele score method with summarized data
Extension to summary statistic method
Example: effect of LDL-cholesterol on coronary heart disease risk
Discussion
Sample code
Summary statistic method
Likelihood-based method with correlated instrumental variables
Findings
Additional tables for simulation study with correlated instrumental variables
Additional table for applied example
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call