Abstract

When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G‐formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double‐robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double‐robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine‐learning methods. It therefore requires weaker assumptions than its competitors. We provide a step‐by‐step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R‐code is provided in easy‐to‐read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial

Highlights

  • During the last 30 years, modern epidemiology has been able to identify significant limitations of classic epidemiologic methods when the focus is to explain the effect of a risk factor on a disease or outcome.[1,2] In observational studies, treatment groups are typically not directly comparable; a careful statistical adjustment for confounders is necessary to obtain unbiased exposure effect estimates

  • We present the average treatment effect (ATE) and marginal odds ratio (MOR) estimates for 3 different estimators and 3 different versions of targeted maximum likelihood estimation (TMLE) (TMLE‐1‐2‐3, Table 2) corresponding to the usage of either logistic regressions or SL, implemented either with the default or user‐supplied library given in Box 10, for the estimation of the outcome expectation and the propensity score

  • Targeted maximum likelihood estimation implemented with ensemble and machine‐learning algorithms has advantages over other methods, but surprisingly there is limited guidance for the application of the technique for the estimation of the ATE and MOR when dealing with binary outcomes.[32]

Read more

Summary

TUTORIAL IN BIOSTATISTICS

When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G‐formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. Targeted maximum likelihood estimation is a semiparametric double‐robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine‐learning methods. It requires weaker assumptions than its competitors. We provide a step‐by‐step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated.

| INTRODUCTION
Wy y
Misspecified treatment and outcome models
Findings
| DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.