Abstract
Despite the availability of numerous satellite-based machine-learning methods for supplementing air quality monitoring data, models estimating ground-level PM2.5 concentrations have proven unsatisfactory at a daily scale during periods without PM2.5 observations, with validated R2 values for daily estimates in the literature typically below 0.55. In this study, we introduce a novel ensemble learning-based hindcast modeling method that incorporates information on predictors in periods without PM2.5 observations into the modeling process to enhance the reliability of daily estimates for those unmonitored historical years. Our proposed method constructs the annually varying relationship between PM2.5 and the predictors during periods without PM2.5 observations, differing from previous hindcast modeling procedures where the PM2.5-predictors relationship established was assumed to be the same between periods with and without PM2.5 observations. Using PM2.5 heavily polluted northern China from 2013 to 2020 as an example, our history-informed machine-learning hindcast model outperformed the state-of-the-art method by a large margin, improving the leave-one-year-out cross-validation (CV) R2 [ root-mean-square error (RMSE)] from 0.56 [34.66 μg/m3] to 0.63 [31.79 μg/m3] a daily scale, while achieving a comparable sample-based 10-fold CV R2 [(RMSE] of 0.90 [16.85 μg/m3]. In addition, incorporating satellite aerosol optical depth (AOD) into the hindcast modeling further improved the historical PM2.5-predictors relationship and provided more reliable estimates during periods without PM2.5 observations, increasing the leave-one-year-out CV R2 [RMSE] from 0.61 [30.31 μg/m3] for the model without AOD to 0.65 [28.56 μg/m3]. Consequently, our proposed hindcast modeling framework enables the production of more reliable PM2.5 estimates during periods without PM2.5 observations and holds promise for application to other regions and air pollutants with longer predictor data compared to the response variable data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have