Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile

María Elisa Quinteros,Siyao Lu,Carola Blazquez,Juan Pablo Cárdenas-R,Ximena Ossa,Juana-María Delgado-Saborit,Roy M Harrison,Pablo Ruiz-Rudolph

doi:10.1016/j.atmosenv.2018.11.053

Abstract

Missing data from air quality datasets is a common problem, but is much more severe in small cities or localities. This poses a great challenge for environmental epidemiology as high exposures to pollutants worldwide occur in these settings and gaps in datasets hinder health studies that could later inform local and international policies. Here, we propose the use of imputation methods as a tool to reconstruct air quality datasets and have applied this approach to an air quality dataset in Temuco, a mid-size city in Chile as a case-study. We attempted to reconstruct the database comparing five approaches: mean imputation, conditional mean imputation, K-Nearest Neighbor imputation, multiple imputation and Bayesian Principal Component Analysis imputation. As a base for the imputation methods, linear regression models were fitted for PM2.5 against other air quality and meteorological variables. Methods were challenged against validation sets where data was removed artificially. Imputation methods were able to reconstruct the dataset with good performance in terms of completeness, errors, and bias, even when challenged against the validations sets. The performance improved when including covariates from a second monitoring station in Temuco. K-Nearest Neighbor imputation showed slightly better performance than multiple imputation for error (25% vs. 27%) and bias (2.1% vs. 3.9%), but presented lower completeness (70% vs. 100%). In summary, our results show that the imputation methods can be a useful tool in reconstructing air quality datasets in a real-life situation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Atmospheric Environment	Publication Date: Dec 7, 2018
Citations: 35	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment

Lead the way for us

Similar Papers

Methods for estimating the AIDS incubation time distribution when date of seroconversion is censored.
Ronald B Geskus
Statistics in Medicine | VOL. 20
Ronald B GeskusRonald B Geskus
16 Feb 2001
Statistics in Medicine | VOL. 20

Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
N Shaadan ... N A M Rahim
Journal of Physics: Conference Series | VOL. 1366
N Shaadan, et. al.N Shaadan ... N A M Rahim
01 Nov 2019
Journal of Physics: Conference Series | VOL. 1366

Simulation study on missing data imputation methods for longitudinal data in cohort studies
Y M Li ... F Y Chen
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42
Y M Li, et. al.Y M Li ... F Y Chen
10 Oct 2021
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42

Enhancing Material Property Predictions through Optimized KNN Imputation and Deep Neural Network Modeling
Khan Murad Ali
IgMin Research | VOL. 2
Khan Murad AliKhan Murad Ali
13 Jun 2024
IgMin Research | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment