Abstract

Missing data can frequently occur in a longitudinal data analysis. In the literature, many methods have been proposed to handle such an issue. Complete case (CC), mean substitution (MS), last observation carried forward (LOCF), and multiple imputation (MI) are the four most frequently used methods in practice. In a real-world data analysis, the missing data can be MCAR, MAR, or MNAR depending on the reasons that lead to data missing. In this paper, simulations under various situations (including missing mechanisms, missing rates, and slope sizes) were conducted to evaluate the performance of the four methods considered using bias, RMSE, and 95% coverage probability as evaluation criteria. The results showed that LOCF has the largest bias and the poorest 95% coverage probability in most cases under both MAR and MCAR missing mechanisms. Hence, LOCF should not be used in a longitudinal data analysis. Under MCAR missing mechanism, CC and MI method are performed equally well. Under MAR missing mechanism, MI has the smallest bias, smallest RMSE, and best 95% coverage probability. Therefore, CC or MI method is the appropriate method to be used under MCAR while MI method is a more reliable and a better grounded statistical method to be used under MAR.

Highlights

  • The problem of missing observations can frequently occur in all types of clinical trials, especially when observations are measured repeatedly at each scheduled visit for the same subject in a longitudinal study

  • Complete case (CC), MS, LOCF, MI, MCAR, and MAR stand for complete case, mean substitution, last observation carried forward, multiple imputation, missing completely at random, and missing at random, respectively

  • In the case that the slope is 10, the 95% coverage probability (CP) are very poor for the LOCF method (100% and 0% for Intercept and Slope, respectively)

Read more

Summary

Introduction

The problem of missing observations can frequently occur in all types of clinical trials, especially when observations are measured repeatedly at each scheduled visit for the same subject in a longitudinal study. (2014) Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study. In statistical practice, missing data is a key problem that can never be avoided completely. J = 1, 2, ,5 ) for the ith subject at the tth visit according to a multivariate normal distribution model, E(Yit )= β0 +β1t where β0 is the intercept and β1 is the slope. A data set X with n rows and p columns is drawn from a multivariate normal distribution with a zero mean vector and a variance-covariance matrix Σ given as follows σ1,1 σ1,5 Σ=. The variance at each occasion is assumed to be constant over time, while the correlation coefficient between Yis and Yit is assumed to be a positive correlation coefficient ρ of a first-order autoregressive model (i.e., AR(1)).

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call