Abstract

<strong class="journal-contentHeaderColor">Abstract.</strong> As air pollution is regarded as the single largest environmental health risk in Europe it is important that communication to the public is up-to-date, accurate and provides means to avoid exposure to high air pollution levels. Long- as well as short-term exposure to outdoor air pollution is associated with increased risks of mortality and morbidity. Up-to-date information on present and coming days&rsquo; air quality help people avoid exposure during episodes with high levels of air pollution. Air quality forecasts can be based on deterministic dispersion modelling, but to be accurate this requires detailed information on future emissions, meteorological conditions and process oriented dispersion modelling. In this paper we apply different machine learning (ML) algorithms &ndash; Random forest (RF), Extreme Gradient Boosting (XGB) and Long-Short Term Memory (LSTM) &ndash; to improve 1-, 2- and 3-day deterministic forecasts of PM<sub>10</sub>, NO<sub>x</sub>, and O<sub>3</sub> at different sites in Greater Stockholm, Sweden. It is shown that the deterministic forecasts can be significantly improved using the MLs but that the degree of improvement of the deterministic forecasts depends more on pollutant and site than on what machine learning (ML) algorithm is applied. Deterministic forecasts of PM<sub>10</sub> is improved by the MLs through the input of lagged measurements and Julian day partly reflecting seasonal variations not properly parameterised in the deterministic forecasts. A systematic discrepancy by the deterministic forecasts in the diurnal cycle of NO<sub>x</sub> is removed by the MLs considering lagged measurements and calendar data like hour of the day and weekday reflecting the influence of local traffic emissions. For O<sub>3</sub> at the urban background site the local photochemistry not properly accounted for by the relatively coarse Copernicus Atmosphere Monitoring Service ensemble model (CAMS) used here for forecasting O<sub>3</sub>, but compensated using the MLs by taking lagged measurements into account. The machine learning models performed similarly well for the sites and pollutants. Performance measures like Pearson correlation, root mean square error (RMSE), mean absolute percentage error (MAPE) and mean absolute error (MAE), typically differed less than 30 % between ML models. At the urban background site, the deviations between modelled and measured concentrations (RMSE errors) are smaller than uncertainties in the measurements estimated according to recommendations by the Forum for Air Quality Modeling (FAIRMODE) in the context of the air quality directives. At the street canyon sites modelled errors are higher, and similar to measurement uncertainties. Further work is needed to reduce deviations between model results and measurements for short periods with relatively high concentrations (peaks). Such peaks can be due to a combination of non-typical emissions and unfavourable meteorological conditions and may be difficult to forecast. We have also shown that deterministic forecasts of NO<sub>x</sub> at street canyon sites can be improved using MLs even if they are trained at other sites. For PM<sub>10</sub> this was only possible using LSTM. An important aspect to consider when choosing ML is that the decision tree based models (RF and XGB) can provide useful output on the importance of features that is not possible using neural network models like LSTM, and also that training and optimisation is more complex with LSTM, which could be important to consider when selecting ML algorithm in an operational forecast system. A random forest model is now implemented operationally in the forecasts of air pollution and health risks in Stockholm. Development of the tuning process and identification of more efficient predictors may make forecast more accurate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call