Formulating causal questions and principled statistical answers.

Els Goetghebeur,Saskia Le Cessie,Ingeborg Waernbaum,Bianca De Stavola,Erica Em Moodie,

doi:10.1002/sim.8741

Abstract

Although review papers on causal inference methods are now available, there is a lack of introductory overviews on what they can render and on the guiding criteria for choosing one particular method. This tutorial gives an overview in situations where an exposure of interest is set at a chosen baseline (“point exposure”) and the target outcome arises at a later time point. We first phrase relevant causal questions and make a case for being specific about the possible exposure levels involved and the populations for which the question is relevant. Using the potential outcomes framework, we describe principled definitions of causal effects and of estimation approaches classified according to whether they invoke the no unmeasured confounding assumption (including outcome regression and propensity score‐based methods) or an instrumental variable with added assumptions. We mainly focus on continuous outcomes and causal average treatment effects. We discuss interpretation, challenges, and potential pitfalls and illustrate application using a “simulation learner,” that mimics the effect of various breastfeeding interventions on a child's later development. This involves a typical simulation component with generated exposure, covariate, and outcome data inspired by a randomized intervention study. The simulation learner further generates various (linked) exposure types with a set of possible values per observation unit, from which observed as well as potential outcome data are generated. It thus provides true values of several causal effects. R code for data generation and analysis is available on www.ofcaus.org, where SAS and Stata code for analysis is also provided.

Highlights

The literature on causal inference methods and their applications is expanding at an extraordinary rate
Since individual-level causal effects can never be observed, we focus on expected causal contrasts in certain populations
We applied the methods discussed in the previous section to estimate the ATE and the ATT of A1, A2, and A3 on weight at 3 months using the data from the simulation learner PROBITsim

Summary

Introduction

The literature on causal inference methods and their applications is expanding at an extraordinary rate. (2) We may seek to learn about the effect of treatments received in these trials, beyond the pragmatic effect of treatment assigned This calls for an exploration of compliance with the assignment and for follow-up exposure data, that is, nonrandomized components of treatment received. (4) A wealth of patient data is being gathered in disease registries and other electronic patient records; these often contain more variables, larger sample sizes, and greater population coverage than an RCT. These needs and opportunities push scientists to seek causal answers in observational settings with larger and less selective populations, with longer follow-up, and with a wider range of exposures and outcome types (including quality of life and adverse events)

Objectives

Results

Conclusion