PM2.5 constitutes a complex and diverse mixture that significantly impacts the environment, human health, and climate change. However, existing observation and numerical simulation techniques have limitations, such as a lack of data, high acquisition costs, and multiple uncertainties. These limitations hinder the acquisition of comprehensive information on PM2.5 chemical composition and effectively implement refined air pollution protection and control strategies. In this study, we developed an optimal deep learning model to acquire hourly mass concentrations of key PM2.5 chemical components without complex chemical analysis. The model was trained using a randomly partitioned multivariate dataset arranged in chronological order, including atmospheric state indicators, which previous studies did not consider. Our results showed that the correlation coefficients of key chemical components were no less than 0.96, and the root mean square errors ranged from 0.20 to 2.11 µg/m3 for the entire process (training and testing combined). The model accurately captured the temporal characteristics of key chemical components, outperforming typical machine-learning models, previous studies, and global reanalysis datasets (such as Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) and Copernicus Atmosphere Monitoring Service ReAnalysis (CAMSRA)). We also quantified the feature importance using the random forest model, which showed that PM2.5, PM1, visibility, and temperature were the most influential variables for key chemical components. In conclusion, this study presents a practical approach to accurately obtain chemical composition information that can contribute to filling missing data, improved air pollution monitoring and source identification. This approach has the potential to enhance air pollution control strategies and promote public health and environmental sustainability.
Read full abstract