The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries

Da Vid L Streiner

doi:10.1177/070674370204700111

Abstract

Missing data are common in most studies, especially when subjects are followed over time. This can jeopardize the validity of a study because of reduced power to detect differences, and especially because subjects who are lost to follow-up rarely represent the group as a whole. There are several approaches to handling missing data, but some may result in biased estimates of the treatment effect, and others may overestimate the significance of the statistical tests. When cross-sectional data (for example, demographic and background information and a single outcome measurement time) are missing, replacement with the group mean leads to an underestimate of the standard deviation (SD) and inflation of the Type I error rate. Using regression estimates, especially with error built into the imputed value, lessens but does not eliminate this problem. Multiple imputation preserves the estimates of both the mean and the SD, even when a significant proportion of the data are missing. With longitudinal studies, the last observation carried forward (LOCF) approach preserves the sample size, but may make unwarranted assumptions about the missing data, resulting in either underestimating or overestimating the treatment effects. Growth curve analysis makes maximal use of the existing data and makes fewer assumptions.

Full Text