Multiple imputation by predictive mean matching in cluster-randomized trials

Brittney E Bailey,Rebecca Andridge,Abigail B Shoben

doi:10.1186/s12874-020-00948-6

Brittney E Bailey, Rebecca Andridge + Show 1 more

Open Access

https://doi.org/10.1186/s12874-020-00948-6

Copy DOI

Abstract

BackgroundRandom effects regression imputation has been recommended for multiple imputation (MI) in cluster randomized trials (CRTs) because it is congenial to analyses that use random effects regression. This method relies heavily on model assumptions and may not be robust to misspecification of the imputation model. MI by predictive mean matching (PMM) is a semiparametric alternative, but current software for multilevel data relies on imputation models that ignore clustering or use fixed effects for clusters. When used directly for imputation, these two models result in underestimation (ignoring clustering) or overestimation (fixed effects for clusters) of variance estimates.MethodsWe develop MI procedures based on PMM that leverage these opposing estimated biases in the variance estimates in one of three ways: weighting the distance metric (PMM-dist), weighting the average of the final imputed values from two PMM procedures (PMM-avg), or performing a weighted draw from the final imputed values from the two PMM procedures (PMM-draw). We use Monte-Carlo simulations to evaluate our newly proposed methods relative to established MI procedures, focusing on estimation of treatment group means and their variances after MI.ResultsThe proposed PMM procedures reduce the bias in the MI variance estimator relative to established methods when the imputation model is correctly specified, and are generally more robust to model misspecification than even the random effects imputation methods.ConclusionsThe PMM-draw procedure in particular is a promising method for multiply imputing missing data from CRTs that can be readily implemented in existing statistical software.

Highlights

Random effects regression imputation has been recommended for multiple imputation (MI) in cluster randomized trials (CRTs) because it is congenial to analyses that use random effects regression
Parametric procedures for imputing missing data from a CRT are commonly used in practice, but a semiparametric procedure for imputation should be more robust to misspecification of the imputation model
Estimation and inference after multiple imputation When data are missing from a CRT, how we handle the missing data depends in part on the reason behind the missing data

Summary

Introduction

Random effects regression imputation has been recommended for multiple imputation (MI) in cluster randomized trials (CRTs) because it is congenial to analyses that use random effects regression. This method relies heavily on model assumptions and may not be robust to misspecification of the imputation model. MI by predictive mean matching (PMM) is a semiparametric alternative, but current software for multilevel data relies on imputation models that ignore clustering or use fixed effects for clusters. Estimation and inference after multiple imputation When data are missing from a CRT, how we handle the missing data depends in part on the reason behind the missing data These missing data mechanisms are commonly classified into one of three types: missing com-

Objectives

Methods

Results

Conclusion