Does heterogeneity underlie differences in treatment effects estimated from SuperLearner versus logistic regression? An application in nutritional epidemiology

Julie M Petersen,Ashley I Naimi,Lisa M Bodnar

doi:10.1016/j.annepidem.2023.04.017

Julie M Petersen, Ashley I Naimi + Show 1 more

https://doi.org/10.1016/j.annepidem.2023.04.017

Copy DOI

Abstract

PurposeA strength of SuperLearner is that it may accommodate key interactions between model variables without a priori specification. In prior research, protective associations between fruit intake and preeclampsia were stronger when estimated using SuperLearner with targeted maximum likelihood estimation (TMLE) compared with multivariable logistic regression without any interaction terms. We explored whether heterogeneity (i.e., differences in the effect estimate due to interactions between fruit intake and covariates) may partly explain differences in estimates from these two models. MethodsUsing a U.S. prospective pregnancy cohort (2010–2013, n = 7781), we estimated preeclampsia risk differences (RDs) for higher versus lower fruit density using multivariable logistic regression and included two-way statistical interactions between fruit density and each of the 25 model covariates. We compared the RDs with those from SuperLearner with TMLE (gold standard) and logistic regression with no interaction. ResultsFrom the logistic regression models with two-way statistical interactions, 48% of the preeclampsia RDs were ≤−0.02 (closer to SuperLearner with TMLE estimate); 40% equaled −0.01 (same as logistic regression with no interaction estimate); the minority of RDs were at or crossed the null. ConclusionsOur exploratory analysis provided preliminary evidence that heterogeneity may partly explain differences in estimates from logistic regression versus SuperLearner with TMLE.

Full Text