Air pollution continues to have a significant impact on Europeans living in urban areas, and episodes of elevated PMx are responsible for a large number of premature deaths (mostly due to heart disease and stroke) each year. According to the annual EEA reports, Poland is one of the most polluted countries in Europe, experiencing high PMx concentrations during winter that mostly result from large emissions and unfavourable weather conditions in combination with environmental features. Thus, in addition to implementing municipal mitigation strategies, alerting residents to pollution episodes through accurate PMx forecasting is necessary. This research aimed to assess the feasibility of short-term PMx forecasting via machine learning (ML) and the subsequent identification of the primary meteorological covariates. The data comprised 10 years of hourly winter PM10 and PM2.5 concentrations measured at 11 urban air quality monitoring stations, including background, traffic, and industrial sites, in four large Polish agglomerations, viz., Poznań, Kraków, Łódź, and Gdańsk, which cover areas with high population density and diverse environments that extend from the Baltic Sea coast (Tricity) through the lowlands (Poznań and Łódź) to the highlands (Kraków). We tested four ML models: AIC-based stepwise regression, two tree-based algorithms (random forests and XGBoost), and neural networks. Employing analysis and cross-validation, we found that XGBoost performed the best, followed by random forests and neural networks, and stepwise regression performed the worst. This ranking was apparent in the threshold exceedance values of the binary forecasts obtained via regression. Overall, our results confirm the high applicability of ML to short-term air quality prediction with the perfect prog approach.
Read full abstract