Abstract

SummaryThe nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the “substantive model compatible” (MI-SMC) method of Bartlett et al. (2015). We also apply the “MI matched set” approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.

Highlights

  • (1) Replace the missing values in X by arbitrary starting values, to create a complete data set

  • In the full-cohort and intermediate MI approaches, the MI-SMC algorithm just described is applied to all the data on the full cohort

  • Additional simulation scenario: Model misspecification We investigated the performance of the methods when the imputation model is misspecified

Read more

Summary

Introduction

(1) Replace the missing values in X by arbitrary starting values, to create a complete data set. (2) If Xk is a continuous variable, fit the imputation model Xk = α0 +α1Z +α2D+α3H(T )+ α4X−k + with residual error variance σ2 to the subset of individuals for whom Xk is observed, using the current values of X−k. (3) If Xk is continuous, for each individual with missing Xk in the original data set, replace the current value of Xk with a sample from a normal distribution with mean

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call