An artificial neural network ensemble approach to generate air pollution maps.

S Van Roode,I J Turias,J González-Enrique,J J Ruiz-Aguilar

doi:10.1007/s10661-019-7901-6

Abstract

The objective of this research is to propose an artificial neural network (ANN) ensemble in order to estimate the hourly NO2 concentration at unsampled locations. Spatial interpolation methods and linear regression models with regularization have been compared to perform the ensemble. The study case is based on the region of the Bay of Algeciras (Spain). This area is very industrialized and presents high concentrations of traffic. Air pollution data has been collected from the monitoring network maintained by the Andalusian Government in the region. On one hand, two totally different methods have been used and compared such as inverse distance weight (IDW) and least absolute shrinkage and selection operator (LASSO) in order to generate maps of pollutant concentration values. On the other hand, an ensemble approach has been developed using the outputs of the previous models. The ensemble is based on an ANN with backpropagation learning. An experimental procedure using cross-validation has been applied in order to compare the different models based on several performance indexes (R correlation coefficient, MSE, MAE and d index of fitness) and together to Friedman test and Bonferroni correction. The results reveal that the proposed ensemble approach presents better performance than single models in general terms. A validation procedure has been conducted using a leave-one-out strategy using each monitoring station. IDW method presents an average value of R equals 0.72 and a maximum R equals 0.87, a minimum MSE equals 78.00, a minimum MAE equals 5.841 and a maximum d equals 0.913. LASSO presents an average value of R equals 0.76 and a maximum R equals 0.86, a minimum MSE equals 59.13, a minimum MAE equals 5.490 and a maximum d equals 0.900. Finally, the ANN ensemble shows an average value of R equals 0.77 and a maximum R equals 0.87, a minimum MSE equals 54.05, a minimum MAE equals 4.972 and a maximum d equals 0.915. The main objective has been to produce adequate atmospheric pollutant concentration maps and, therefore, to obtain estimations for locations that are distinct to the monitoring stations. Another objective has been to have in hand a system to produce robust measurements. This kind of system could be useful for missing data imputation and to find out reading errors (i.e. unexpected deviations or calibration problems) in some of the nodes of a network.

Full Text