Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

Ahmad R Alsaber,Jiazhu Pan,Adeeba Al-Hurban

doi:10.3390/ijerph18031333

Abstract

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for (18.4%), (18.5%), (57.4%), (19.0%), and (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Environmental Research and Public Health	Publication Date: Feb 1, 2021
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

Abstract

Talk to us

Similar Papers

More From: International Journal of Environmental Research and Public Health

Lead the way for us

Similar Papers

Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
N Shaadan ... N A M Rahim
Journal of Physics: Conference Series | VOL. 1366
N Shaadan, et. al.N Shaadan ... N A M Rahim
01 Nov 2019
Journal of Physics: Conference Series | VOL. 1366

How to deal with missing longitudinal data in cost of illness analysis in Alzheimer's disease-suggestions from the GERAS observational study.
Mark Belger ... Josep Maria Haro
BMC medical research methodology | VOL. 16
Mark Belger, et. al.Mark Belger ... Josep Maria Haro
18 Jul 2016
BMC medical research methodology | VOL. 16

Evaluating Methods for Imputing Missing Data from Longitudinal Monitoring of Athlete Workload.
Lauren C Benson ... Carolyn A Emery
Journal of Sports Science and Medicine | VOL. 20
Lauren C Benson, et. al.Lauren C Benson ... Carolyn A Emery
05 Mar 2021
Journal of Sports Science and Medicine | VOL. 20

Multiple Imputation Ensembles for Time Series (MIE-TS)
Aliya Aleryani ... Aaron Bostrom
ACM Transactions on Knowledge Discovery from Data | VOL. 17
Aliya Aleryani, et. al.Aliya Aleryani ... Aaron Bostrom
22 Feb 2023
ACM Transactions on Knowledge Discovery from Data | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

Abstract

Talk to us

Similar Papers

More From: International Journal of Environmental Research and Public Health