Abstract

In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. To deal with this challenge, several imputation methods have been developed in the literature to handle missing values where the most commonly used are complete case method, mean imputation method, last observation carried forward (LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a simulation study to investigate the efficiency of these four typical imputation methods with longitudinal data setting under missing completely at random (MCAR). We categorize missingness with three cases from a lower percentage of 5% to a higher percentage of 30% and 50% missingness. With this simulation study, we make a conclusion that LOCF method has more bias than the other three methods in most situations. MI method has the least bias with the best coverage probability. Thus, we conclude that MI method is the most effective imputation method in our MCAR simulation study.

Highlights

  • Missing values often occur in clinical trials and longitudinal studies

  • Shrive et al [10] suggested that multiple imputation (MI) method was the most accurate method for dealing with missing data in most data scenarios, but in some situations, mean imputation method performed slightly better than MI method

  • White and Carlin [11] pointed out a similar concept, stating that complete case method was more efficient than MI method in some scenarios, even though MI method was widely advocated as an improvement over complete case method

Read more

Summary

Introduction

Missing values often occur in clinical trials and longitudinal studies. Whenever there are missing data, there is loss of information, which causes a reduction in efficiency or a drop in the precision in statistical inference. When the size of the dataset is large enough, analysis could be considered using complete case method where a subject is completely deleted whenever this subject has missing values at any measurement occasion. With this deletion, some statistical procedures and software do execute a program automatically, as though there are no missing values under this situation. The rule of thumb suggests that 20% or less of missing data is acceptable for imputation [1,2,3,4], no clear rules exist regarding how much is too much missing data [5]

Background
Missing Mechanism
Imputation Methods
Simulation Settings
Simulation Performance Measures
Missingness Mechanism
Simulation Result
Method Original Complete
Simulation in Other Scenarios
Simulation Result with Small ρ Value
Simulation Result with Unstructured Correlation Structure
Findings
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call