Missing Value Imputation for PM10 Concentration in Sabah using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm

Muhammad Izzuddin Rumaling,Fuei Pien Chee,Jedol Dayou,Jackson Hian Wui Chang,Steven Soon Kai Kong,Justin Sentian

doi:10.5572/ajae.2020.14.1.062

Muhammad Izzuddin Rumaling, Fuei Pien Chee + Show 4 more

Open Access

https://doi.org/10.5572/ajae.2020.14.1.062

Copy DOI

Journal: Asian Journal of Atmospheric Environment	Publication Date: Mar 1, 2020
Citations: 9	License type: CC BY 4.0

Abstract

Missing data in large data analysis has affected further analysis conducted on dataset. To fill in missing data, Nearest Neighbour Method (NNM) and Expectation Maximization (EM) algorithm are the two most widely used methods. Thus, this research aims to compare both methods by imputing missing data of air quality in five monitoring stations (CA0030, CA0039, CA0042, CA0049, CA0050) in Sabah, Malaysia. PM10 (particulate matter with aerodynamic size below 10 microns) dataset in the range from 2003–2007 (Part A) and 2008–2012 (Part B) are used in this research. To make performance evaluation possible, missing data is introduced in the datasets at 5 different levels (5%, 10%, 15%, 25% and 40%). The missing data is imputed by using both NNM and EM algorithm. The performance of both data imputation methods is evaluated using performance indicators (RMSE, MAE, IOA, COD) and regression analysis. Based on performance indicators and regression analysis, NNM performs better compared to EM in imputing data for stations CA0039, CA0042 and CA0049. This may be due to air quality data missing at random (MAR). However, this is not the case for CA0050 and part B of CA0030. This may be due to fluctuation that could not be detected by NNM. Accuracy evaluation using Mean Absolute Percentage Error (MAPE) shows that NNM is more accurate imputation method for most of the cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Missing Value Imputation for PM10 Concentration in Sabah using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm

Abstract

Talk to us

Similar Papers

More From: Asian Journal of Atmospheric Environment

Lead the way for us

Similar Papers

Simulation study on missing data imputation methods for longitudinal data in cohort studies
Y M Li ... F Y Chen
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42
Y M Li, et. al.Y M Li ... F Y Chen
10 Oct 2021
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42

Data-Driven Imputation Method for Traffic Data in Sectional Units of Road Links
Sehyun Tak ... Soomin Woo
IEEE Transactions on Intelligent Transportation Systems | VOL. 17
Sehyun Tak, et. al.Sehyun Tak ... Soomin Woo
01 Jun 2016
IEEE Transactions on Intelligent Transportation Systems | VOL. 17

A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data
Ting Hsiang Lin
Quality & Quantity | VOL. 44
Ting Hsiang LinTing Hsiang Lin
11 Sep 2008
Quality & Quantity | VOL. 44

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java
Y Susianto ... A Kurnia
IOP Conference Series: Earth and Environmental Science | VOL. 58
Y Susianto, et. al.Y Susianto ... A Kurnia
01 Mar 2017
IOP Conference Series: Earth and Environmental Science | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Missing Value Imputation for PM10 Concentration in Sabah using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm

Abstract

Talk to us

Similar Papers

More From: Asian Journal of Atmospheric Environment