Abstract

Multiple imputation is considered as a superior data analysis practice to deal with missing values. This is, in particular, also true in the important case of categorical variables. However, imputation techniques could be a daunting task for the less technical data practitioner. In the case of categorical variables, imputation can be avoided by recoding the indicator matrix of the data set before analysis. New category levels are created for each variable to which missing values are assigned. The recoded indicator matrix enables the separation of missing and observed responses. Subset multiple correspondence analysis (sMCA) allows a focused analysis on either of the missing or non-missing subsets of the recoded indicator matrix while preserving the original variation of the data. The main purpose of this article is to evaluate the sMCA approach using simulated data which incorporate various parameters. In addition, the visualization of multivariate categorical data containing missing values that follows from the sMCA approach is presented. This article could aid as a guide when deciding whether the sMCA approach is suitable for the analysis of incomplete multivariate categorical data. To ensure effortless application of the presented methodology, a complete R script with commented functions is available as supplementary material.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call