Pseudo-Random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.

Ashley I Naimi,Ya-Hui Yu,Lisa M Bodnar

doi:10.1097/ede.0000000000001785

Abstract

Use of machine learning to estimate exposure effects introduces a dependence between the results of an empirical study and the value of the seed used to fix the pseudo-random number generator. We used data from 10,038 pregnant women and a 10% subsample (N = 1,004) to examine the extent to which the risk difference for the relation between fruit and vegetable consumption and preeclampsia risk changes under different seed values. We fit an augmented inverse probability weighted estimator with two Super Learner algorithms: a simple algorithm including random forests and single layer neural networks and a more complex algorithm with a mix of tree-based, regression based, penalized and simple algorithms. We evaluated the distributions of risk differences, standard errors, and p values that result from 5,000 different seed value selections. Our findings suggest important variability in the risk difference estimates, as well as an important effect of the stacking algorithm used. The interquartile range width (IQRw) of the risk differences in the full sample with the simple algorithm was 13 per 1000. However, all other IQRs were roughly an order of magnitude lower. The medians of the distributions of risk differences differed according to the sample size and the algorithm used. Our findings add another dimension of concern regarding the potential for "p-hacking", and further warrants the need to move away from simplistic evidentiary thresholds in empirical research. When empirical results depend on pseudo-random number generator seed values, caution is warranted in interpreting these results.

Full Text