Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: a case study of Beijing, China.

Yafei Wu,Kewei Shi,Shaowu Lin,Zirong Ye,Ya Fang

doi:10.1007/s11356-022-18913-9

Abstract

Machine learning (ML) has shown high predictive ability in environmental research. Accurate estimation of daily PM2.5 concentrations is a prerequisite to address environmental public health issues. However, studies on the interpretability of ML algorithms were limited. In this study, we aimed to estimate the daily concentrations of PM2.5 at a seasonal level, and to understand the potential mechanisms of ML algorithms' decisions with SHapley Additive exPlanations (SHAP). Daily ground PM2.5 concentrations and meteorological data were obtained from the Beijing Municipal Ecological and Environmental Monitoring Center, and China Meteorological Data Service Centre between December 2013 and 2019 November. We calculated correlation coefficient and variance inflation factor (VIF) to eliminate the variables with collinearity, and recursive feature elimination (RFE) was further used to selected more important predictors. A series of ML algorithms, including linear regression, the variants of linear regression (Ridge, Lasso, Elasticnet), decision tree (DT), k-nearest neighbor (KNN), support vector regression (SVR), ensemble methods (random forest: RF, eXtreme Gradient Boosting: XGBoost), and deep learning (long short-term memory network: LSTM), were developed to estimate seasonal-level daily PM2.5 concentrations. A 10-fold cross validation was used to tune hyperparameters, and root mean square error (RMSE), mean absolute error (MAE), ratio of performance to deviation (RPD), and Lin's concordance correlation coefficient (LCCC) were used to evaluate models' performance. SHAP was performed for local and global interpretability analysis. The results showed that the distribution of PM2.5 concentrations in Beijing showed obvious seasonal patterns. A total of five variables (Precipitation, Mean wind speed, Sunshine duration, Mean surface temperature, Mean relative humidity) were selected for final prediction. LSTM showed much higher accuracy than other traditional ML models, achieved the smallest RMSE of 19.58µg/m3 and MAE of 15.11µg/m3. In terms of selected data set, there was acceptable (LCCC = 0.41 ~ 0.52) agreement and accuracy (RPD = 0.97 ~ 1.92) for LSTM. The SHAP analyses revealed that the meteorological factors had different influences in specific predictions, and the complex interactions were also illustrated. These results enhance our understanding of meteorological factors-PM2.5 relationships and explain the mechanisms of ML algorithms' decisions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: a case study of Beijing, China.

Abstract

Talk to us

Similar Papers

More From: Environmental Science and Pollution Research

Lead the way for us

Journal: Environmental Science and Pollution Research	Publication Date: Feb 12, 2022
Citations: 20

Similar Papers

Prediction of early neurologic deterioration in patients with perforating artery territory infarction using machine learning: a retrospective study.
Wei Liu ... Hongjiang Cheng
Frontiers in neurology | VOL. 15
Wei Liu, et. al.Wei Liu ... Hongjiang Cheng
01 Jan 2024
Frontiers in neurology | VOL. 15

Predicting stroke severity of patients using interpretable machine learning algorithms.
Amir Sorayaie Azar ... Hadi Lotfnezhad Afshar
European journal of medical research | VOL. 29
Amir Sorayaie Azar, et. al.Amir Sorayaie Azar ... Hadi Lotfnezhad Afshar
14 Nov 2024
European journal of medical research | VOL. 29

Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning.
Wenzhu Song ... Feng Wu
Frontiers in cellular and infection microbiology | VOL. 13
Wenzhu Song, et. al.Wenzhu Song ... Feng Wu
19 Dec 2023
Frontiers in cellular and infection microbiology | VOL. 13

Can we explain machine learning-based prediction for rupture status assessments of intracranial aneurysms?
N Mu ... J Tang
Biomedical Physics & Engineering Express | VOL. 9
N Mu, et. al.N Mu ... J Tang
10 Mar 2023
Biomedical Physics & Engineering Express | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: a case study of Beijing, China.

Abstract

Talk to us

Similar Papers

More From: Environmental Science and Pollution Research