Abstract

Participants in epidemiologic and genetic studies are rarely true random samples of the populations they are intended to represent, and both known and unknown factors can influence participation in a study (known as selection into a study). The circumstances in which selection causes bias in an instrumental variable (IV) analysis are not widely understood by practitioners of IV analyses. We use directed acyclic graphs (DAGs) to depict assumptions about the selection mechanism (factors affecting selection) and show how DAGs can be used to determine when a two-stage least squares IV analysis is biased by different selection mechanisms. Through simulations, we show that selection can result in a biased IV estimate with substantial confidence interval (CI) undercoverage, and the level of bias can differ between instrument strengths, a linear and nonlinear exposure-instrument association, and a causal and noncausal exposure effect. We present an application from the UK Biobank study, which is known to be a selected sample of the general population. Of interest was the causal effect of staying in school at least 1 extra year on the decision to smoke. Based on 22,138 participants, the two-stage least squares exposure estimates were very different between the IV analysis ignoring selection and the IV analysis which adjusted for selection (e.g., risk differences, 1.8% [95% CI, -1.5%, 5.0%] and -4.5% [95% CI, -6.6%, -2.4%], respectively). We conclude that selection bias can have a major effect on an IV analysis, and further research is needed on how to conduct sensitivity analyses when selection depends on unmeasured data.

Highlights

  • Selection completely at random, or depending on Z or U When selection is completely at random, or selection depends on Z (Figure 1a), or selection depends on U (Figure 1b), βX2SLS is not biased by selection

  • The IV assumptions remain true in the selected sample; for example, within the selected sample, Z is not associated with C nor U because pathways Z → X ← U and Z → X ← C remain blocked by collider X, and pathways Z → X → Y ← U and Z → X → Y ← C remain blocked by collider Y

  • Selection depending on Z and C When selection depends on Z + C (i.e., Z and C; Figure 1c), βX2SLS is biased because the Y − Z association is confounded by C in the selected sample

Read more

Summary

Detailed discussion on selection mechanisms

Depending on Z or U When selection is completely at random, or selection depends on Z (Figure 1a), or selection depends on U (Figure 1b), βX2SLS is not biased by selection. Explanation: Selection implies conditioning on collider S which opens the noncausal pathway Z → S ← C → Y [1, 2], and Y − Z is confounded by C in the selected sample. The Y − Z association is not confounded by U in the selected sample (i.e., Z remains independent of U ) because all pathways between Z and U remain blocked by a collider (e.g., Z → X ← U ). Suppose selection depends on Z + C but we have measured confounder C with error, denoted by C∗ (DAG not shown) Conditioning on this mismeasured variable C∗ would not block the noncausal pathway between Z and Y via S and C, and so selection bias would not be eliminated; that is, βX2S|CLS would remain biased by selection. Conditioning on C∗ would result in confounding the Y − Z association by the measurement error of C, known as “residual confounding”

Selection depending on X
Selection depending on Y only
Exposure effect conditional on C
Remaining selection mechanisms for our IV example
Methods
Results
Detailed description of the applied example
Calculation of the weights
Further comments
Simulation study based on the applied example
VU ZXY
Nonlinear Moderate
Strong Strong Moderate Moderate Strong Strong
Mean SE Coverage
Full sample
Percentage of current smokers
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call