Evaluation of multivariate time series clustering for imputation of air pollution data

Wedad Alahamade,Claire E Reeves,Iain Lake,Beatriz De La Iglesia

doi:10.5194/gi-10-265-2021

Wedad Alahamade, Claire E Reeves + Show 2 more

Open Access

PDF Available

https://doi.org/10.5194/gi-10-265-2021

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract. Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air-pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However, problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a multivariate time series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutant concentrations produces very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours and of the characteristics of different stations according to their environmental type.

Highlights

Time series (TS) analysis has received much attention in recent decades due to its importance in many real-world applications such as earthquake prediction (Di Bello et al, 1996), weather forecasting (Carbajal-Hernández et al, 2012), air pollution forecasting (Du et al, 2020), and human activity recognition (Seto et al, 2015)
Model 6 (Median), which is the model that uses the ensemble technique of other models, gives the lowest error average (RMSE), the highest Pearson correlation coefficient (R), and the highest agreement between imputed and observed concentrations (IOA) for O3, PM2.5, and PM10
NO2 is shorter-lived than other pollutants and shows greater spatial variability, with concentrations being strongly influenced by the environment type

Summary

Introduction

Time series (TS) analysis has received much attention in recent decades due to its importance in many real-world applications such as earthquake prediction (Di Bello et al, 1996), weather forecasting (Carbajal-Hernández et al, 2012), air pollution forecasting (Du et al, 2020), and human activity recognition (Seto et al, 2015). In this study we focus on the four main pollutants: particulate matter less than 2.5 μm in diameter (PM2.5) or less than 10 μm in diameter (PM10), ozone (O3), and nitrogen dioxide (NO2). These pollutants are measured hourly at various monitoring stations. Model 2 (CA+ENV) Background suburban 5 6.590. Model 2 (CA+ENV) Traffic urban 65 −0.014. Panel (a) shows that stations classed as traffic urban are associated with the highest RMSE (0.62), while industrial suburban stations have the lowest RMSE (0.36).

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Geoscientific Instrumentation, Methods and Data Systems	Publication Date: Nov 3, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

Evaluation of multivariate time series clustering for imputation of air pollution data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Geoscientific Instrumentation, Methods and Data Systems

Lead the way for us

Similar Papers

Uncertainty in the relationship between criteria pollutants and low birth weight in Chicago
Naresh Kumar
Atmospheric Environment | VOL. 49
Naresh KumarNaresh Kumar
13 Dec 2011
Atmospheric Environment | VOL. 49

Mobile Air Monitoring: Measuring Change in Air Quality in the City of Hamilton, 2005–2010
Matthew D. Adams ... Patrick F. DeLuca
Social Indicators Research | VOL. 108
Matthew D. Adams, et. al.Matthew D. Adams ... Patrick F. DeLuca
04 May 2012
Mobile Air Monitoring: Measuring Change in Air Quality in the City of Hamilton, 2005–2010
Matthew D. Adams ... Patrick F. DeLuca

УДОСКОНАЛЕННЯ ДІЮЧОЇ СИСТЕМИ СПОСТЕРЕЖЕНЬ ЗА ЯКІСТЮ АТМОСФЕРНОГО ПОВІТРЯ В М. КИЄВІ У ВІДПОВІДНОСТІ ДО ВИМОГ ЄС
I.V Dvoretska ... N.S Zhemera
Hydrology, hydrochemistry and hydroecology | VOL. -
I.V Dvoretska, et. al.I.V Dvoretska ... N.S Zhemera
01 Jan 2021
Hydrology, hydrochemistry and hydroecology | VOL. -

Lack of Correlation Between Air Pollution and Acute-Onset Atrial Fibrillation
Gianfranco Cervellin ... Giuseppe Lippi
Canadian Journal of Cardiology | VOL. 29
Gianfranco Cervellin, et. al.Gianfranco Cervellin ... Giuseppe Lippi
17 Oct 2013
Canadian Journal of Cardiology | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Evaluation of multivariate time series clustering for imputation of air pollution data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Geoscientific Instrumentation, Methods and Data Systems