Abstract

One of the most common challenges in biomedical and psychosocial research is missing data, which occurs when respondents refuse to provide answers to sensitive questions and when study subjects are lost to follow-up during the repeated assessments of longitudinal trials. This paper is the first in a 3-part series focusing on this important topic; it describes different types of missing data and their differential effects on model estimates, focusing on study design strategies that can be used to prevent or minimize missing data and, thus, maintain the scientific integrity of the research. The second paper in the series will discuss implementation strategies to manage and reduce missing data while conducting the study, and the third paper will discuss analytic strategies for dealing with missing data after completion of data collection. It is always worth devoting careful attention to issues like missing data that may have significant effects on model estimates. As the saying goes, ‘an ounce of prevention is worth a pound of cure,’ so it is much better to focus on these issues during the planning stage of a study rather than having to deal with them later in the study. In this paper, we focus squarely on such preventive strategies as the first line of defense against the ubiquitous problem of missing data in clinical research studies. 1. Types of missing data and their effects on model estimates The reasons for missing data vary, and the degree to which missing data decreases the validity of the estimates depends on how the missing data arises. Thus it is important to make plausible assumptions about how missing data occurs in a study and, based on these assumptions, select appropriate models for addressing the effect of the missing data on inference from the observed results of the study. There are three statistical models with increasing levels of generalizability that are commonly used to classify different types of missing data. If missing data occur in a random fashion—that is, with no particular pattern that determines which data are observed and which are missing—it is typically referred to as ‘missing completely at random’ (MCAR). Data that are MCAR have no influence on any of the study participants’ outcomes and, thus, may be ignored because they do not result in an inferential bias. In many follow-up studies participants may be lost to follow up because of deteriorated or improved health conditions. In this situation the missed data are not MCAR because the probability of the missed visit depends on the outcome. For example, if an investigational medication worsens depression, subjects may drop out from the study over time, creating a so-called ‘monotone missing data’ pattern. In cases like this where whether or not data is missing is influenced by treatment-related effects, ignoring the missing data and focusing only on the subjects with complete data will usually give rise to biased estimates of treatment effects. If depression measures taken at baseline (or prior to dropping out) can be used to model this dependence structure in the missing data pattern (i.e., the relationship between baseline severity of depression and dropping out) this information can then be incorporated into the model for treatment effects to address the bias. If the dependence structure between the outcome of interest and the missing data can be modeled based on observed data, the missing data are classified as ‘missing at random’ (MAR).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call