Local exposure misclassification in national models: relationships with urban infrastructure and demographics.

Joshua S. Apte,Michelle Audirac,Corwin M. Zigler,Sarah E. Chambliss,Mark Joseph Campmier

doi:10.1038/s41370-023-00624-z

Joshua S. Apte, Michelle Audirac + Show 3 more

Open Access

https://doi.org/10.1038/s41370-023-00624-z

Copy DOI

Abstract

National-scale linear regression-based modeling may mischaracterize localized patterns, including hyperlocal peaks and neighborhood- to regional-scale gradients. For studies focused on within-city differences, this mischaracterization poses a risk of exposure misclassification, affecting epidemiological and environmental justice conclusions. Characterize the difference between intraurban pollution patterns predicted by national-scale land use regression modeling and observation-based estimates within a localized domain and examine the relationship between that difference and urban infrastructure and demographics. We compare highly resolved (0.01 km2) observations of NO2 mixing ratio and ultrafine particle (UFP) count obtained via mobile monitoring with national model predictions in thirteen neighborhoods in the San Francisco Bay Area. Grid cell-level divergence between modeled and observed concentrations is termed "localized difference." We use a flexible machine learning modeling technique, Bayesian Additive Regression Trees, to investigate potentially nonlinear relationships between discrepancy between localized difference and known local emission sources as well as census block group racial/ethnic composition. We find that observed local pollution extremes are not represented by land use regression predictions and that observed UFP count significantly exceeds regression predictions. Machine learning models show significant nonlinear relationships among localized differences between predictions and observations and the density of several types of pollution-related infrastructure (roadways, commercial and industrial operations). In addition, localized difference was greater in areas with higher population density and a lower share of white non-Hispanic residents, indicating that exposure misclassification by national models differs among subpopulations. Comparing national-scale pollution predictions with hyperlocal observations in the San Francisco Bay Area, we find greater discrepancies near major roadways and food service locations and systematic underestimation of concentrations in neighborhoods with a lower share of non-Hispanic white residents. These findings carry implications for using national-scale models in intraurban epidemiological and environmental justice applications and establish the potential utility of supplementing large-scale estimates with publicly available urban infrastructure and pollution source information.

Full Text