Abstract

Six generalized machine learning (ML) ensemble models were developed to predict the real-time hourly ozone concentration of the following day. These models were used to forecast hourly ozone concentrations of the following day for all of 2017 in the city of Seoul, South Korea. To prepare the training dataset, it was referred to observed meteorology and air pollution parameters of the 2014–2016 period. The ensemble models fuse two regression models: a low-ozone peak model and a high-ozone model. For both, extremely randomized trees and deep neural networks were used. A regularization approach was also adopted that adjusts the model toward capturing higher ozone peaks by resampling the training dataset based on the peaks. Adopting the proposed ML ensemble forecasting method over single-model ML techniques as a part of mainstream practice for air quality forecasting will be beneficial for several reasons. For one, the proposed method, which captures daily maximum ozone concentrations during the high-ozone season (April–September), reduces the ozone peak prediction error by 5 to 30 ppb. In addition, compared to station-specific (independent) ML models with more frequent low-ozone values, models are trained with a uniformly distributed dataset, so they are more generalizable in nature. As a result, unlike station-specific models, they retain their accuracy (yearly IOA = 0.84–0.89) in all stations with an IOA increment. Proposed models also make predictions several times faster, requiring only one-time training for predicting an entire station network. Based on a categorical analysis of the training dataset, an algorithm was proposed for selecting the most suitable model for each month. The “best” model further improves the accuracy of both the ML ensemble and individual models by up to 2.4%. This study shows that the ML ensemble modeling approach is a fast, reliable, and robust technique that can benefit environmental decision-makers in urban regions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call