Abstract

Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM2.5) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (R2 from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM2.5 components could be estimated with good accuracy, especially when collocated PM2.5 total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.

Highlights

  • Ambient fine particulate matter pollution (PM2.5 ) is regulated worldwide because of its well-established associations with cardiorespiratory diseases and premature mortality [1]

  • In this paper, we examine the use of a recent statistical learning algorithm, Bayesian additive regression tree (BART) [9,10] for predicting PM2.5 components

  • In spatial CV, R2 was highest for SO4 and lowest for elemental carbon (EC), which can be explained by the higher and lower spatial heterogeneity associated with these two pollutants

Read more

Summary

Introduction

Ambient fine particulate matter pollution (PM2.5 ) is regulated worldwide because of its well-established associations with cardiorespiratory diseases and premature mortality [1]. The ability to accurately estimate PM2.5 components at locations and at time points without monitoring data can help better support epidemiological studies analyses. Atmosphere 2020, 11, 1233 and satellite-derived parameters. These include generalized additive models that allow for nonlinear associations [3], geostatistical models that incorporate spatial–temporal dependence [4], and machine learning algorithms such as random forest, neural networks, and ensemble modeling [5,6,7,8]. The main advantages of machine learning methods include the ability to handle large sets of highly correlated predictors, and the ability to construct complex predictive algorithms that are nonadditive and nonlinear

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call