Forecasting 24 h averaged PM2.5 concentration in the Aburrá Valley using tree-based machine learning models, global forecasts, and satellite information

Jhayron S Pérez-Carrasquilla,Juan Manuel Sánchez,K Santiago Hernández,Paola A Montoya,Mauricio Ramírez

doi:10.5194/ascmo-9-121-2023

Abstract

Abstract. We develop a framework to forecast 24 h averaged particulate matter (PM2.5) concentrations 4 d in advance in ground-based stations over the metropolitan area of the Aburrá Valley, Colombia. The input variables are gathered from a highly diverse set of sources, including in situ real-time PM2.5 observations, meteorological forecasts from the Global Forecasting System (GFS), aerosol optical depth (AOD) forecasts from the European Copernicus Atmosphere Monitoring Service (CAMS), and the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire products. We compare the performance of two tree-based machine learning (ML) methods, random forests (RFs) and gradient boosting (GB), with linear regression as a baseline for error metrics. One of the disadvantages of tree-based models is their inability to make skillful predictions out of the domain in which the models were trained. To address that problem, we implement piecewise linear regression learners within the models. Additionally, to enhance the performance of the models, we use a customized loss function that considers the probability distribution of the target values. Tree-based models highly outperform the linear regression, with GB showing the best results in most of the 19 stations used in this study. We also test two approaches for the multi-step output problem, a direct multi-output (MO) scheme and a recursive (RC) scheme, with the GB–MO approach showing the best results. According to the performance analysis, the predictability is less for values away from the mean and decreases between 06:00 LT (local time) and the early afternoon, when the expansion of the boundary layer occurs. To contribute to understanding the sources of predictability and uncertainty of air quality in the city, we perform a feature importance analysis revealing that the relevance of the different independent variables is a function of the lead time. Particularly, apart from the past concentrations, the variables that most affect the predictability are the forecasted aerosol optical depth (AOD), the integrated fire radiative power over a forecasted back trajectory (BT-IFRP), and the predicted planetary boundary layer height (PBLH). In the testing period, the models showed the ability to forecast poor-air-quality events in the valley with more than 1 d of anticipation. This study serves as a framework for developing and evaluating the ML-based air quality forecasting models over the Andean region.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Advances in Statistical Climatology, Meteorology and Oceanography	Publication Date: Dec 22, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Forecasting 24 h averaged PM2.5 concentration in the Aburrá Valley using tree-based machine learning models, global forecasts, and satellite information

Abstract

Talk to us

Similar Papers

More From: Advances in Statistical Climatology, Meteorology and Oceanography

Lead the way for us

Similar Papers

Formally combining different lines of evidence in extreme-event attribution
Friederike E L Otto ... Robert Vautard
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10
Friederike E L Otto, et. al.Friederike E L Otto ... Robert Vautard
30 Oct 2024
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10

A robust approach to Gaussian process implementation
Juliette Mukangango ... Benjamin W Priest
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10
Juliette Mukangango, et. al.Juliette Mukangango ... Benjamin W Priest
29 Oct 2024
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10

Spatiotemporal functional permutation tests for comparing observed climate behavior to climate model projections
Joshua P French ... Piotr S Kokoszka
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10
Joshua P French, et. al.Joshua P French ... Piotr S Kokoszka
02 Oct 2024
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10

Parametric model for post-processing visibility ensemble forecasts
Ágnes Baran ... Sándor Baran
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10
Ágnes Baran, et. al.Ágnes Baran ... Sándor Baran
02 Sep 2024
Advances in Statistical Climatology, Meteorology and Oceanography | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Forecasting 24 h averaged PM2.5 concentration in the Aburrá Valley using tree-based machine learning models, global forecasts, and satellite information

Abstract

Talk to us

Similar Papers

More From: Advances in Statistical Climatology, Meteorology and Oceanography