Missing Data in Clinical Research: A Tutorial on Multiple Imputation

Peter C Austin,Ian R White,Douglas S Lee,Stef Van Buuren

doi:10.1016/j.cjca.2020.11.010

Abstract

Missing data is a common occurrence in clinical research. Missing data occurs when the value of the variables of interest are not measured or recorded for all subjects in the sample. Common approaches to addressing the presence of missing data include complete-case analyses, where subjects with missing data are excluded, and mean-value imputation, where missing values are replaced with the mean value of that variable in those subjects for whom it is not missing. However, in many settings, these approaches can lead to biased estimates of statistics (eg, of regression coefficients) and/or confidence intervals that are artificially narrow. Multiple imputation (MI) is a popular approach for addressing the presence of missing data. With MI, multiple plausible values of a given variable are imputed or filled in for each subject who has missing data for that variable. This results in the creation of multiple completed data sets. Identical statistical analyses are conducted in each of these complete data sets and the results are pooled across complete data sets. We provide an introduction to MI and discuss issues in its implementation, including developing the imputation model, how many imputed data sets to create, and addressing derived variables. We illustrate the application of MI through an analysis of data on patients hospitalised with heart failure. We focus on developing a model to estimate the probability of 1-year mortality in the presence of missing data. Statistical software code for conducting MI in R, SAS, and Stata are provided.

Highlights

Missing data is a common occurrence in clinical research
Multiple Imputation for Missing Data we provide an introduction to Multiple imputation (MI) and discuss issues related to its use
White et al suggested that, as a rule of thumb, the number of imputed data sets should be at least as large as the percentage of subjects with any missing data.[11]. They suggest that this will result in estimates of regression coefficients, test statistics, and P values with minor variability across repeated MI analyses

Summary

Multiple imputation using multivariate imputation by chained equations

Conditional specification is a strategy for specifying multivariate models through conditional distributions. The imputation process described above uses linear regression and takes the imputed values as random draws from a normal distribution. A second option to is to draw imputations from the observed values by a technique called predictive-mean matching (PMM).[11] For a given subject with missing data on the variable in question, PMM identifies those subjects with no missing data on the variable in question whose linear predictors (created using the regression coefficients from the fitted imputation model) are close to Canadian Journal of Cardiology Volume 37 2021. The linear predictor of the given subject (created using the regression coefficients sampled from the appropriate posterior distribution, as described above) Of those subjects who are close, one subject is selected at random and the observed value of the given variable for that randomly selected subject is used as the imputed value of the variable for the subject with missing data. This is in contrast to PMM, where the imputed variables are drawn from an observed empirical distribution

Analyses in the M imputed data sets

Which variables to include in the imputation model?

Imputing derived variables

Missing outcome variables

Data sources

Descriptive statistics

Comparison of subjects with and without missing data

No of subjects with observed data

Complete case analysis

Multiple imputation

Descriptive statistics in the imputed data sets

Logistic regression in the imputed data sets

Neck vein distension

Findings

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Canadian Journal of Cardiology	Publication Date: Dec 1, 2020
Citations: 384	License type: cc-by

R Discovery Prime

R Discovery Prime

Missing Data in Clinical Research: A Tutorial on Multiple Imputation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Canadian Journal of Cardiology

Lead the way for us

Similar Papers

The use of multiple imputation for the analysis of missing data.
Sandip Sinharay ... Hal S Stern
Psychological Methods | VOL. 6
Sandip Sinharay, et. al.Sandip Sinharay ... Hal S Stern
01 Jan 2001
Psychological Methods | VOL. 6

Uso da imputação múltipla de dados faltantes: uma simulação utilizando dados epidemiológicos
Luciana Neves Nunes ... Jandyra Maria Guimarães Fachel
Cadernos de Saúde Pública | VOL. 25
Luciana Neves Nunes, et. al.Luciana Neves Nunes ... Jandyra Maria Guimarães Fachel
01 Feb 2009
Cadernos de Saúde Pública | VOL. 25

Using multiple imputation for analysis of incomplete data in clinical research.
Lynn Mccleary
Nursing Research | VOL. 51
Lynn McclearyLynn Mccleary
01 Sep 2002
Nursing Research | VOL. 51

A Simplified Framework for Using Multiple Imputation in Social Work Research
R A Rose ... M W Fraser
Social Work Research | VOL. 32
R A Rose, et. al.R A Rose ... M W Fraser
01 Sep 2008
Social Work Research | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Missing Data in Clinical Research: A Tutorial on Multiple Imputation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Canadian Journal of Cardiology