PM2.5 concentration forecasting: Development of integrated multivariate variational mode decomposition with kernel Ridge regression and weighted mean of vectors optimization

Hai Tao,Iman Ahmadianfar,Leonardo Goliatt,Syed Shabi Ul Hassan Kazmi,Mohamed A Yassin,Atheer Y Oudah,Raad Z Homod,Hussein Togun,Zaher Mundher Yaseen

doi:10.1016/j.apr.2024.102125

Abstract

The accurate prediction of the PM2.5 air quality parameter within industrial and urban settings is the most pressing issue examined by the researchers because it has high health implications. However, accurately forecasting PM2.5 levels is crucial. The traditional machine learning (ML) models are incapable, as the indices fluctuate daily. To successfully manage this problem, a new ML framework is proposed that incorporates various techniques, such as LGBM feature selection (light gradient-boosting machine), MVMD (multivariate variational mode decomposition), KRidge (kernel Ridge regression), and INFO (weighted mean of vectors). The proposed framework is used to estimate PM2.5 pollution at specific stations in China for a one-time prediction. The LGBM feature selection technique is the first step of the pre-processing, which performs to select the most important variables. Next, the MVMD splits the initial signal into intrinsic mode functions (IMFs), accommodating the signal's non-stationary multivariate. Following that, the KRidge approach is applied to each sub-component using the best input feature, and the resulting predictions are summed up to get the PM2.5 levels. To assess the validity of the proposed MVMD-KRidge-INFO model, the categories of gaussian-process-regression (GPR), locally-weighted-linear-regression (LWLR), and multivariate adaptive regression splines (MARS) are analyzed in individual and hybrid moldes. As observed from the research results, MVMD-KRidge-INFO performs optimally at forecasting PM2.5 levels at Huairou and Shunyi, as evidenced by R, RMSE, MAPE, IA, MdAE, and U95% metrics.

Full Text