Comparative Study of Various Methods of Handling Missing Data

Fredrick Ochieng’ Odhiambo

doi:10.11648/j.mma.20200502.14

Abstract

Scientific literature lack straight forward answer as to the most suitable method for missing data imputation in terms of simplicity, accuracy and ease of use among the existing methods. Exploration various methods of data imputation is done, and then a robust method of data imputation is proposed. The paper uses simulated data sets generated for various distributions. A regression function on the simulated data sets is used and obtained the residual standard errors for the function obtained. Data are randomly from the set of independent variables to create artificial data-non response and use suitable methods to impute the missing data. The method of Mean, regression, hot and cold decking, multiple, median imputation, list wise deletion, EM algorithm and the nearest neighbour method are considered. This paper investigates the three most common traditional methods of handling missing data to establish the most optimal method. The suitability is hence determined by the method whose imputed data sample characteristic does not vary considerably from the original data set before imputation. The variation is here determined using the regression intercept and the residual standard error. R statistical package has been used widely in most of the regression cases. Microsoft excel is used to determine the correlation of columns in hot decking method; this is because it is readily available as a component of Microsoft package. The results from data analysis section indicated an intercept and R-squared values that closely mirror those of original data sets, suggesting that median imputation is a better data imputation method among the conventional methods. This finding is important from the research point of view, given the many cases of data missingness in scientific research. Finding and using the median is simple and as such most researchers have a ready tool at hand for handling missing data.

Highlights

Research is the driving force behind any development of a Nation
It is worth noting that when part of a data is missing from a given survey and missing data is ignored by and using only the available sample, the result so yielded may not be representative of the population under study; after all there are some of its characteristics missing
According to [16], list wise deletion method is regarded as the most common and easiest method of dealing with missing data, it is called complete case analysis according to [11]. This approach there- fore leads to a reduction in sample size which in turn translates into reduced statistical power bringing into question the how representative the remaining sample is of the population being studied

Summary

Introduction

Research is the driving force behind any development of a Nation. Any endeavourer in this area requires that the people concerned with the research arm themselves with the right kind of tools that shall help them get accurate and relevant information from the survey being undertaken. Missing data is a big challenge in many areas of research, especially in social research. It is worth noting that when part of a data is missing from a given survey and missing data is ignored by and using only the available sample, the result so yielded may not be representative of the population under study; after all there are some of its characteristics missing. For a detailed review of these approaches. [25]

EM Algorithm

List Wise Deletion

Mean Substitution

Regression Imputation

Multiple Imputations

Hot Decking

Median Imputation

Regression for Complete Data Set

Methods

Parameter Estimation

Results

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Modelling and Applications	Publication Date: Jan 1, 2020
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparative Study of Various Methods of Handling Missing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Modelling and Applications

Lead the way for us

Similar Papers

Software Implementation of Missing Data Recovery: Comparative Analysis
A.-N Ya Fataliieva ... N V Kovtun
Statistics of Ukraine | VOL. 91
A.-N Ya Fataliieva, et. al.A.-N Ya Fataliieva ... N V Kovtun
16 Dec 2020
Statistics of Ukraine | VOL. 91

Missing Value Imputation for PM10 Concentration in Sabah using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm
Muhammad Izzuddin Rumaling ... Justin Sentian
Asian Journal of Atmospheric Environment | VOL. 14
Muhammad Izzuddin Rumaling, et. al.Muhammad Izzuddin Rumaling ... Justin Sentian
01 Mar 2020
Asian Journal of Atmospheric Environment | VOL. 14

Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record
Zhen Hu ... Gyorgy J Simon
Journal of Biomedical Informatics | VOL. 68
Zhen Hu, et. al.Zhen Hu ... Gyorgy J Simon
16 Mar 2017
Journal of Biomedical Informatics | VOL. 68

From Missing Data Imputation to Data Generation
Diogo Telmo Neves ... Fabian Prasser
Journal of Computational Science | VOL. 61
Diogo Telmo Neves, et. al.Diogo Telmo Neves ... Fabian Prasser
05 Mar 2022
Journal of Computational Science | VOL. 61

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Study of Various Methods of Handling Missing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Modelling and Applications