In the context of ongoing industrialization, air pollution has become an urgent global problem, particularly severe in large cities such as Hanoi (Vietnam), Beijing (China), and others. Air pollution, especially the concentration of fine particulate matter (PM2.5), is not only harmful to human health but also has significant negative impacts on the environment, economy, and quality of life. This study aims to enhance the ability to predict air pollution levels more accurately. By using machine learning models, meteorologists can better predict air pollution levels and propose more effective mitigation solutions. The article utilizes a multivariate time series dataset, including meteorological and air pollution indices from Beijing, China, from 2010 to 2014. Machine learning models such as Lasso Regression, Support Vector Regression, Random Forest, XGBoost, and, notably, a Stack Model combining the four aforementioned models, are evaluated. The performance of these models is measured using statistical indicators such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²). Among these models, the Stack model provides the most accurate predictions for the PM2.5 index.
Read full abstract