A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide

Jie Chen,John Gulliver,Klea Katsouyanni,Kathrin Wolf,Matthias Ketzel,Evangelia Samoli,Kees De Hoogh,Maciek Strak,Danielle Vienneau,Gerard Hoek,Aaron Van Donkelaar,Barbara Hoffmann,Randall V Martin,Per E Schwartz,Tom Bellander,Roel Vermeulen,Bert Brunekreef,Massimo Stafoggia,Ole Hertel,Mariska Bauwelinck,Ulla Arthur Hvidtfeldt ,Nicole Janssen

doi:10.1016/j.envint.2019.104934

Abstract

Empirical spatial air pollution models have been applied extensively to assess exposure in epidemiological studies with increasingly sophisticated and complex statistical algorithms beyond ordinary linear regression. However, different algorithms have rarely been compared in terms of their predictive ability.This study compared 16 algorithms to predict annual average fine particle (PM2.5) and nitrogen dioxide (NO2) concentrations across Europe. The evaluated algorithms included linear stepwise regression, regularization techniques and machine learning methods. Air pollution models were developed based on the 2010 routine monitoring data from the AIRBASE dataset maintained by the European Environmental Agency (543 sites for PM2.5 and 2399 sites for NO2), using satellite observations, dispersion model estimates and land use variables as predictors. We compared the models by performing five-fold cross-validation (CV) and by external validation (EV) using annual average concentrations measured at 416 (PM2.5) and 1396 sites (NO2) from the ESCAPE study. We further assessed the correlations between predictions by each pair of algorithms at the ESCAPE sites.For PM2.5, the models performed similarly across algorithms with a mean CV R2 of 0.59 and a mean EV R2 of 0.53. Generalized boosted machine, random forest and bagging performed best (CV R2~0.63; EV R2 0.58–0.61), while backward stepwise linear regression, support vector regression and artificial neural network performed less well (CV R2 0.48–0.57; EV R2 0.39–0.46). Most of the PM2.5 model predictions at ESCAPE sites were highly correlated (R2 > 0.85, with the exception of predictions from the artificial neural network). For NO2, the models performed even more similarly across different algorithms, with CV R2s ranging from 0.57 to 0.62, and EV R2s ranging from 0.49 to 0.51. The predicted concentrations from all algorithms at ESCAPE sites were highly correlated (R2 > 0.9). For both pollutants, biases were low for all models except the artificial neural network. Dispersion model estimates and satellite observations were two of the most important predictors for PM2.5 models whilst dispersion model estimates and traffic variables were most important for NO2 models in all algorithms that allow assessment of the importance of variables.Different statistical algorithms performed similarly when modelling spatial variation in annual average air pollution concentrations using a large number of training sites.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Environment International	Publication Date: Jun 20, 2019
Citations: 218	License type: cc-by

R Discovery Prime

R Discovery Prime

A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide

Abstract

Talk to us

Similar Papers

More From: Environment International

Lead the way for us

Similar Papers

Assessment of multiple air pollutant exposures using a mobile monitoring approach: a comparison of spatial modeling with multiple regression algorithms
J Xu ... Z Bai
ISEE Conference Abstracts | VOL. 2020
J Xu, et. al.J Xu ... Z Bai
26 Oct 2020
ISEE Conference Abstracts | VOL. 2020

Forecast Municipal Solid Waste Generation in Sri Lanka
D.M.S.H Dissanayaka ... Shanmuganathan Vasanthapriyan
-
D.M.S.H Dissanayaka, et. al.D.M.S.H Dissanayaka ... Shanmuganathan Vasanthapriyan
01 Dec 2019
01 Dec 2019

Traffic speed prediction techniques in urban environments
Ahmad H Alomari ... Asalah A Jadah
Heliyon | VOL. 8
Ahmad H Alomari, et. al.Ahmad H Alomari ... Asalah A Jadah
01 Dec 2022
Heliyon | VOL. 8

Comparison of Model Performance on Housing Business Using Linear Regression, Random Forest Regressor, SVR, and Neural Network
Luke Mangala Soegianto ... Muhamad Fajar
Procedia Computer Science | VOL. 245
Luke Mangala Soegianto, et. al.Luke Mangala Soegianto ... Muhamad Fajar
01 Jan 2024
Procedia Computer Science | VOL. 245

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide

Abstract

Talk to us

Similar Papers

More From: Environment International