Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

Ashley I Naimi,Ya-Hui Yu,Lisa M Bodnar

doi:10.1097/ede.0000000000001785

Abstract

The use of machine learning to estimate exposure effects introduces a dependence between the results of an empirical study and the value of the seed used to fix the pseudo-random number generator. We used data from 10,038 pregnant women and a 10% subsample (N = 1004) to examine the extent to which the risk difference for the relation between fruit and vegetable consumption and preeclampsia risk changes under different seed values. We fit an augmented inverse probability weighted estimator with two Super Learner algorithms: a simple algorithm including random forests and single-layer neural networks and a more complex algorithm with a mix of tree-based, regression-based, penalized, and simple algorithms. We evaluated the distributions of risk differences, standard errors, and P values that result from 5000 different seed value selections. Our findings suggest important variability in the risk difference estimates, as well as an important effect of the stacking algorithm used. The interquartile range width of the risk differences in the full sample with the simple algorithm was 13 per 1000. However, all other interquartile ranges were roughly an order of magnitude lower. The medians of the distributions of risk differences differed according to the sample size and the algorithm used. Our findings add another dimension of concern regarding the potential for "p-hacking," and further warrant the need to move away from simplistic evidentiary thresholds in empirical research. When empirical results depend on pseudo-random number generator seed values, caution is warranted in interpreting these results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

Abstract

Talk to us

Similar Papers

More From: Epidemiology (Cambridge, Mass.)

Lead the way for us

Similar Papers

Does COVID-19 cause pre-eclampsia?
A Khalil ... P O'Brien
Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology | VOL. 59
A Khalil, et. al.A Khalil ... P O'Brien
13 Jan 2022
Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology | VOL. 59

At random
-
Electronics Letters | VOL. 55
--
01 May 2019
Electronics Letters | VOL. 55

Impact of ABO Blood Group Type on Risk of Venous Thromboembolism in Patients with Cancer
Cornelia Englisch ... Cihan Ay
Blood | VOL. 138
Cornelia Englisch, et. al.Cornelia Englisch ... Cihan Ay
05 Nov 2021
Blood | VOL. 138

Hyperlocalized Measures of Air Pollution and Preeclampsia in Oakland, California.
Dana E Goin ... M Maria Glymour
Environmental Science & Technology | VOL. 55
Dana E Goin, et. al.Dana E Goin ... M Maria Glymour
14 Oct 2021
Environmental Science & Technology | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

Abstract

Talk to us

Similar Papers

More From: Epidemiology (Cambridge, Mass.)