Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia

Zuraira Libasin,Ahmad Zia Ul-Saufie,Noor Azizah Mazeni,Nur Azimah Idris,Wan Suhailah Wan Mohamed Fauzi

doi:10.47836/pjst.29.4.46

Abstract

The missing value in the dataset has always been the critical issue of accurate prediction. It may lead to a misleading understanding of the scenario of air pollution. There might only be a small number of missing (5% to 10%) answers to each problem, but the missing details may vary. This research is focused mainly on solving long gap missing data. Single missing value imputation means replacing blank space in the monitoring dataset from chosen Department of Environment (DoE) monitoring station with the calculated value from the best technique for long gap hours. The variable that is mainly being a monitor is PM10. The technique focused on this research is the single imputation technique. Furthermore, this technique was tested on the Tanjung Malim monitoring station dataset by fitting with five performance indicators. The result was compared with the previous study, whether it is the best used for long gap hour data. Four stages need to be followed to complete this research. The steps are data acquisitions, characteristic analysis of missing value, single imputation approach, verification of approach and suggestion of the best technique. This research used four existing imputation techniques: series mean (SM), mean of nearby points (MNP), linear trend (LT), and linear interpolation (LIN). This research shows that the interpolation technique is the best technique to apply particulate matter missing data replacement with the least mean absolute error and better performance accuracy.

Full Text