Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression

Chao-Ying Joanne Peng,Jin Zhu Jin Zhu

doi:10.1177/0013164407305582

Abstract

For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the expectation-maximization (EM) method of weights and multiple imputation (MI). Sample data are drawn randomly from a population with known characteristics. Missing data on covariates are simulated under two conditions: missing completely at random and missing at random with different missing rates. A logistic regression model was fit to each sample using either the EM or MI approach. The performance of these two approaches is compared on four criteria: bias, efficiency, coverage, and rejection rate. Results generally favored MI over EM. Practical issues such as implementation, inclusion of continuous covariates, and interactions between covariates are discussed.

Full Text