Abstract
Precise and efficient ozone (hbox {O}_{3}) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high hbox {O}_{3} pollution levels on human health and ecosystems. However, the complexity of hbox {O}_{3} formation mechanisms in the troposphere presents a significant challenge in modeling hbox {O}_{3} accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects hbox {O}_{3} concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have