Abstract

BACKGROUND AND AIM: Machine learning approaches are increasingly used in environmental mixtures epidemiology. We evaluated the operating characteristics of currently available machine learning approaches in estimating individual exposure and joint mixture effects along with interaction effects on a time-to-event outcome. METHODS: We conducted an extensive search for methods which allow for: time-to-event outcomes, multiple continuous exposures, non-linear and interaction effects on the outcome, and inferences (i.e. provide estimates and standard errors). We selected: Bayesian Additive Regression Trees (BART), Cox Proportional-Hazards model with penalized splines, Gaussian Process Regression (GPR), and Multivariate Adaptive Regression Splines (MARS). Additionally, we included the Cox Proportional-Hazards model and Cox Elastic-Net due to their popularity. We compared estimates across approaches on the association of six metals with incident cardiovascular disease in the Strong Heart Study. RESULTS:The estimates of the hazard ratio for the main metal of interest, selenium, at its 75th versus 25th percentile, holding all other metals constant, ranged from 1.29 (1.17, 1.39) to 2.00 (1.09, 3.19), estimated using Cox Elastic-Net and BART, respectively. Similar trends were found for estimates of the overall mixture effect on the hazard ratio scale when all metals are at their 75th versus 25th percentile. The estimates ranged from 2.09 (1.82, 3.29) to 4.21 (2.83, 6.93), estimated using Cox Elastic-Net and GPR, respectively. The more flexible approaches estimated higher effects with larger uncertainty. CONCLUSIONS:In this study, results across approaches tended to be the same qualitatively but different quantitatively. Increased hazards were found at higher levels of metals, but the magnitude varied. Although the overall conclusion is consistent, estimates may have different clinical impacts. The fact that the more flexible methods detected interaction and non-linear effects of metals but had higher uncertainty reveals a substantial bias-variance tradeoff. To enhance reproducibility in environmental epidemiology, it is important to show whether results are robust across different modeling approaches. KEYWORDS: Survival, Mixtures analysis, Modeling, Cardiovascular diseases

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call