Abstract

Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data.

Highlights

  • Case-control studies are used to investigate associations between disease and putative risk factors

  • When the imputation model is correctly specified and is compatible with the analysis model, i.e. there exists a model for the joint distribution of all the variables that implies the analysis and imputation models as submodels, and data are missing at random (MAR), joint model Multiple imputation (MI) gives consistent parameter and variance estimates for the analysis model

  • Instead we propose using full-conditional specification (FCS) MI with a set of conditional models that is compatible with this joint model, and is asymptotically equivalent to joint model MI

Read more

Summary

Introduction

Case-control studies are used to investigate associations between disease and putative risk factors. Confounding of observed associations can be handled at the design stage by matching cases and controls on confounders, at the analysis stage by adjusting for confounders using a regression model, or by a combination of these. In matched case-control studies each case is individually matched with one or more controls on a subset of confounders and the (usual) analysis uses conditional logistic regression (CLR) to control for the remaining confounders. A common solution is to restrict analysis to individuals with complete data. Where exclusion of a case or control leaves a matched set in which remaining members are either all cases or all controls, the whole set ceases to contribute information to the CLR estimating equations

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call