Development of an integrated machine learning model to improve the secondary inorganic aerosol simulation over the Beijing–Tianjin–Hebei region

Ning Ding,Xiao Tang,Huangjian Wu,Lei Kong,Xu Dao,Zifa Wang,Jiang Zhu

doi:10.1016/j.atmosenv.2024.120483

Ning Ding, Xiao Tang + Show 5 more

https://doi.org/10.1016/j.atmosenv.2024.120483

Copy DOI

Export

Save

Cite

Journal: Atmospheric Environment

Publication Date: Mar 25, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

Secondary inorganic aerosols (sulfate, nitrate, and ammonium, SNA) are the key components of PM2.5 in China. Accurate and seamless SNA concentration data therefore are important for pollution controls and environmental health risk studies. However, the spatiotemporal characters and changes of SNA are still difficult to accurately estimate due to the shortages of SNA datasets with high accuracy and resolution, as well as the uncertainties in SNA simulations arising from air-quality numerical models. In this study, we developed an integrated model to improve hourly SNA simulations with 5 km spatial resolution over the Beijing–Tianjin–Hebei (BTH) region by integrating the random forest (RF) and Light Gradient Boost Machine (LGBM) using Stacked Generalization. Our model fuses Three-dimensional (3D) numerical simulations from the Weather Research and Forecast model (WRF) and Nested Air Quality Prediction Model System (NAQPMS) with observations of 16 sites from the BTH monitoring network. Three months’ data from January to March 2020 were employed to evaluate the model performance using the cross-validation (CV) method. The results showed that the integrated model provide more accurate simulations of SNA than the 3D numerical model does, with root mean square errors (RMSE) decreased by 33%, 45%, and 35%; correlation coefficient (R) increased by 61%, 28%, and 34%; and Taylor skill score (TSS) increased by 331%, 85%, and 65% for sulfate, nitrate, and ammonium respectively. Moreover, the integrated model showed higher evaluation criteria and more accurate spatiotemporal characteristic compared with the single machine learning (ML) model, especially in heavily polluted area. This study provides a new approach to improve SNA simulations and reveals the potential of ML models for improving aerosol modeling when observational data are scarce.

Full Text