High-resolution, comprehensive exposure data are crucial for accurately estimating the human health impact of PM2.5. In recent years, satellite remote sensing data have been increasingly utilized in PM2.5 models to overcome the limited spatial coverage of ground monitoring stations. However, data gaps in satellite-retrieved parameters such as aerosol optical depth (AOD), the sparsity of regulatory air quality monitors for model training, and nonlinear relationships between PM2.5 and meteorological conditions can affect model performance and cause data gaps in most existing PM2.5 models. In this study, spatial gaps in AOD obtained from Geostationary Operational Environmental Satellite-16 are filled using Goddard Earth Observing System Composition Forecasting AOD estimations. Furthermore, to improve model performance, meteorological predictors such as temperature from the High-Resolution Rapid Refresh model are preprocessed using Daubechies wavelet to extract low and high frequency components. The spatially gap-filled AOD, along with meteorological data, are ingested into a machine learning model to predict hourly PM2.5 at a 1 km spatial resolution in California. The model evaluation metrics (OOB (out-of-bag) R2 = 0.86 and RMSE (root-mean-square error) = 9.27 μg/m3 and 10-fold spatial cross-validation R2 = 0.82 and RMSE = 9.82 μg/m3) demonstrate the model's reliability in predicting ambient PM2.5, especially for states like California that experience frequent episodes of wildfires.
Read full abstract