Abstract

The integration of context into the data-generating process (DGP) in observational studies drives innovation in causal structural frameworks. A common older approach was to pre-specify a DGP and use directed acyclic graphs (DAGs) to guide the minimally sufficient subset of covariates. Advances in Targeted Learning (TL) allow us to estimate DGP via super learners (SL), weighted ensembles of machine learning algorithms. The TL approach supports a larger predictor space to draw upon and reduces potential for model misspecification, improving reproducibility. We compare TL and DAG-directed approaches, using synthetic data, generated from a realistic in silico DGP, and extending examples using Real-World Data (RWD). We simulate data from a given DAG and compare three methods: 1) targeted maximum likelihood estimation (TMLE), 2) traditional treatment weight approaches (stabilized propensity score weights) with a correctly specified model [matching our DGP], and 3) repeating 2) with variations of model misspecification. We then reproduce these on a real-world example. We assess performance by calling additional, more advanced machine-learning libraries into the super learner. In our examples, TMLE improves estimation of propensity scores and reduces biases. This latter effect is particularly apparent in our RWD example, where balancing via inverse treatment weights is problematic due to pathologic distribution of the propensity scores’ domain of common support. The correct model specification, which contains the process that generated the data, is crucial to obtain unbiased estimates of the actual average treatment effects (ATE). This should be incorporated into the model. However, in practice, models often are not selected based on DGP; and modeling choices could result in different answers to the same research question. TMLE provides a rigorous foundation that is less subject to these kinds of manipulations and robust to model misspecification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call