Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions

Jiaxu Peng,Jungpil Hahn,Ke-Wei Huang

doi:10.2139/ssrn.3560070

Abstract

In today’s big data environment, missing values continues to be a problem that harms the data quality. The bias caused by missing values raises the highest concern as it cannot be eliminated simply by increasing the sample size. Although the statistics literature has developed approaches to handling missing values and formulated assumptions regarding when these approaches generate valid statistical inferences, these prescriptions have yet to be broadly accepted by many social science disciplines including the Information Systems (IS) discipline. By reviewing recently published empirical research in information systems, we find that missing values is indeed an important and pervasive problem. We believe that a review of missing value theory is necessary for the IS community to understand the nature of missing values and to promote more rigorous research practice when missing values is often unavoidable. In addition, the not missing at random (NMAR) mechanism brings in challenges in parameter estimation. We contribute to research practice by proposing and demonstrating the superior performance of a Monte Carlo likelihood approach in correcting bias in parameter estimation. We conclude by suggesting that research validity can be enhanced through reasoned adoption of missing value handling method and missing value reporting practice.

Full Text