A BAD IDEA OF USING MODE IMPUTATION METHOD

Afiqah Bazlla Md Soom,Roger Canda,Aszila Asmat,Juhaida Ismail,Aisyah Mat Jasin

doi:10.35631/jistm.729001

Abstract

Missing data is a recurring issue in psychology questionnaire when a respondent does not respond to questions due to personal reasons. In general, two types of imputation techniques are used to replace missing data: single imputation and multiple imputation (MI). The single imputation technique generates a single value to impute each missing data. The simplest methods of single imputation are mean, mode and median. In contrast, the multiple imputation technique imputes each missing data several times resulting in multiple complete datasets. The most popular method in MI that can deal with numerical and categorical data type is the predictive mean matching (PMM). The aim of this article is to compare and visualize how the mode imputation method in the single imputation technique will lead to a biased data distribution and the PMM method in the MI techniques will reduce this issue. Both methods, mode imputation and PMM are often considered when dealing with categorical data types. The mode imputation replaces a missing data with the most frequent value of an item in a survey. Meanwhile, the predictive mean matching is an extension of regression model that apply donor selection strategy to replace a missing data. Results from bar charts visualize the multiple imputation shows less discrepancy between the original distribution and imputed distribution. Thus, in this research, it can be concluded that the PMM method in MI technique shows a less biased distribution than implementing the mode imputation method. A comparison of imputation methods with different missing rates on a survey dataset should be considered for future work.

Full Text