A comparison of machine learning methods for ozone pollution prediction

Qilong Pan,Fouzi Harrou,Ying Sun

doi:10.1186/s40537-023-00748-x

Qilong Pan, Fouzi Harrou + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/s40537-023-00748-x

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Precise and efficient ozone (hbox {O}_{3}) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high hbox {O}_{3} pollution levels on human health and ecosystems. However, the complexity of hbox {O}_{3} formation mechanisms in the troposphere presents a significant challenge in modeling hbox {O}_{3} accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects hbox {O}_{3} concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.

Full Text