Comparison of several linear statistical models to predict tropospheric ozone concentrations

J C.M Pires,M C.M Alvim-Ferraz,M C Pereira,F G Martins

doi:10.1080/00949655.2011.623233

Abstract

This study aims to evaluate the performance of five linear statistical models in the prediction of the next-day hourly average ozone concentrations. The selected models are as follows: (i) multiple linear regression, (ii) principal component regression, (iii) independent component regression (ICR), (iv) quantile regression (QR) and (v) partial least squares regression (PLSR). As far as it has been known, no study comparing the performance of these five linear models for predicting tropospheric ozone concentrations has been presented. Moreover, it is the first time that ICR is applied with this aim. The considered ozone predictors are meteorological data (hourly averages of temperature, relative humidity and wind speed) and environmental data (hourly average concentrations of sulphur dioxide, carbon monoxide, nitrogen oxide, nitrogen dioxide and ozone) of the previous day collected at an urban site with traffic influences. The analysed periods were May and June 2003. The QR model, which tries to model the entire distribution of the O3 concentrations, presents a better performance in the training step, because it tries to model the entire distribution of the O3 concentrations. However, it presents worst predictions in the test step. This means that a new procedure that is better than the one applied (k-nearest neighbours algorithm) and can estimate the percentiles of the output variable in the test data set with more precision should be found. From the five statistical models tested in this study, the PLSR model presents the best predictions of the tropospheric ozone concentrations.

Full Text