Abstract

The ability to accurately model and predict the ambient concentration of Particulate Matter (PM) is essential for effective air quality management and policies development. Various statistical approaches exist for modelling air pollutant levels. In this paper, several approaches including linear, non-linear, and machine learning methods are evaluated for the prediction of urban PM10 concentrations in the City of Makkah, Saudi Arabia. The models employed are Multiple Linear Regression Model (MLRM), Quantile Regression Model (QRM), Generalised Additive Model (GAM), and Boosted Regression Trees1-way (BRT1) and 2-way (BRT2). Several meteorological parameters and chemical species measured during 2012 are used as covariates in the models. Various statistical metrics, including the Mean Bias Error (MBE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), the fraction of prediction within a Factor of Two (FACT2), correlation coefficient (R), and Index of Agreement (IA) are calculated to compare the predictive performance of the models. Results show that both MLRM and QRM captured the mean PM10 levels. However, QRM topped the other models in capturing the variations in PM10 concentrations. Based on the values of error indices, QRM showed better performance in predicting hourly PM10 concentrations. Superiority over the other models is explained by the ability of QRM to model the contribution of covariates at different quantiles of the modelled variable (here PM10). In this way QRM provides a better approximation procedure compared to the other modelling approaches, which consider a single central tendency response to a set of independent variables. Numerous recent studies have used these modelling approaches, however this is the first study that compares their performance for predicting PM10 concentrations.

Highlights

  • IntroductionThe main objectives of modelling air quality are to obtain air quality forecasts, quantify air pollutants temporal trends, increase scientific understanding of the underlying mechanisms for production and destruction of pollutants, and estimate potential air pollution related health effects (e.g., Baur et al, 2004).Many previous investigations have used statistical modelling techniques and machine learning methods to analyse and predict concentrations of Particulate Matter (PM) with an aerodynamic diameter of up to 10 μm (PM10). Aldrin and Haff (2005) used generalised additive modelling to relate traffic volumes, meteorological conditions, and Sayegh et al, Aerosol and Air Quality Research, 14: 653–665, 2014 meteorological variables (e.g., wind speed and temperature), persistence (which is likely to reflect the multiday synoptic time scales that modulate dispersion conditions), and copollutants (e.g., Ozone (O3) and Nitrogen Oxide (NOx)) were useful for predicting PM.a number of studies have compared the performance of various modelling approaches to determine the best model for the prediction of PM10 in different locations. Kukkonen et al (2003) compared the performance of Neural Networks (NN), linear regression model, and a deterministic modelling system in the prediction of both PM10 and NO2 concentrations in Helsinki. Chaloulakou et al (2003), Papanastasiou et al (2007), and Ul-Saufie et al (2011), comparing Multiple Linear Regression (MLR) with NN models for the prediction of PM10, have concluded that non-linear NN method showed better performance

  • All the results presented are obtained through the analysis of an independent data to the model development process and this provides the real forecasting ability of the models

  • Quantile Regression Model (QRM) under-predicted the mean value by only 3.2 μg/m3, while the other models largely under predicted the mean observed values: Multiple Linear Regression Model (MLRM) by 31.1 μg/m3; 2-way Boosted Regression Trees (BRT) by 41.1 μg/m3; Generalised Additive Model (GAM) by 41.7 μg/m3; 1-way BRT by 43.9 μg/m3

Read more

Summary

Introduction

The main objectives of modelling air quality are to obtain air quality forecasts, quantify air pollutants temporal trends, increase scientific understanding of the underlying mechanisms for production and destruction of pollutants, and estimate potential air pollution related health effects (e.g., Baur et al, 2004).Many previous investigations have used statistical modelling techniques and machine learning methods to analyse and predict concentrations of Particulate Matter (PM) with an aerodynamic diameter of up to 10 μm (PM10). Aldrin and Haff (2005) used generalised additive modelling to relate traffic volumes, meteorological conditions, and Sayegh et al, Aerosol and Air Quality Research, 14: 653–665, 2014 meteorological variables (e.g., wind speed and temperature), persistence (which is likely to reflect the multiday synoptic time scales that modulate dispersion conditions), and copollutants (e.g., Ozone (O3) and Nitrogen Oxide (NOx)) were useful for predicting PM.a number of studies have compared the performance of various modelling approaches to determine the best model for the prediction of PM10 in different locations. Kukkonen et al (2003) compared the performance of NN, linear regression model, and a deterministic modelling system in the prediction of both PM10 and NO2 concentrations in Helsinki. Chaloulakou et al (2003), Papanastasiou et al (2007), and Ul-Saufie et al (2011), comparing MLR with NN models for the prediction of PM10, have concluded that non-linear NN method showed better performance. Chaloulakou et al (2003), Papanastasiou et al (2007), and Ul-Saufie et al (2011), comparing MLR with NN models for the prediction of PM10, have concluded that non-linear NN method showed better performance. Pires et al (2008) investigated the performance of five linear models: MLR, principal component regression, independent component regression, quantile regression, and partial least squares regression. Westmoreland et al (2007) assessed the performance of GAM with dispersion modelling approach (ADMS-Urban) and favoured the use of GAM, whereas Baur et al (2004) compared the performance of Quantile Regression Model (QRM) with MLRM, where QRM significantly outperformed MLRM for predicting O3 concentrations. Carslaw et al (2009) suggested the use of Boosted Regression Trees (BRT) model for predicting NOx concentration at mixed source location

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call