A growing list of chemicals are approved for production and use in the United States and elsewhere, and new approaches are needed to rapidly assess the potential exposure and health hazard posed by these substances. Here, we present a high-throughput, data-driven approach that will aid in estimating occupational exposure using a database of over 1.5 million observations of chemical concentrations in U.S. workplace air samples. We fit a Bayesian hierarchical model that uses industry type and the physicochemical properties of a substance to predict the distribution of workplace air concentrations. This model substantially outperforms a null model when predicting whether a substance will be detected in an air sample, and if so at what concentration, with 75.9% classification accuracy and a root-mean-square error (RMSE) of 1.00 log10 mg m-3 when applied to a held-out test set of substances. This modeling framework can be used to predict air concentration distributions for new substances, which we demonstrate by making predictions for 5587 new substance-by-workplace-type pairs reported in the US EPA's Toxic Substances Control Act (TSCA) Chemical Data Reporting (CDR) industrial use database. It also allows for improved consideration of occupational exposure within the context of high-throughput, risk-based chemical prioritization efforts.
Read full abstract