Abstract

Applications of multiple imputation have long outgrown the traditional context of dealing with item nonresponse in cross-sectional data sets. Nowadays multiple imputation is also applied to impute missing values in hierarchical data sets, address confidentiality concerns, combine data from different sources, or correct measurement errors in surveys. However, software developments did not keep up with these recent extensions. Most imputation software can only deal with item nonresponse in cross-sectional settings and extensions for hierarchical data - if available at all - are typically limited in scope. Furthermore, to our knowledge no software is currently available for dealing with measurement error using multiple imputation approaches. The R package hmi tries to close some of these gaps. It offers multiple imputation routines in hierarchical settings for many variable types (for example, nominal, ordinal, or continuous variables). It also provides imputation routines for interval data and handles a common measurement error problem in survey data: biased inferences due to implicit rounding of the reported values. The user-friendly setup which only requires the data and optionally the specification of the analysis model of interest makes the package especially attractive for users less familiar with the peculiarities of multiple imputation. The compatibility with the popular mice package (Van Buuren and Groothuis-Oudshoorn 2011) ensures that the rich set of analysis and diagnostic tools and post-imputation functions available in mice can be used easily, once the data have been imputed.

Highlights

  • Forty years after Donald Rubin’s seminal paper (Rubin, 1978) which introduced the concept of multiple imputation, the approach has been shown to be useful in many contexts going far beyond the classical item nonresponse in cross sectional surveys for which it was originally proposed (Reiter/Raghunathan, 2007)

  • The function hmi returns two additional elements within the mids-object which are not available from mice: gibbs and pooling. The former allows checking the convergence of the gibbs-sampler chains generated by MCMCglmm

  • With hmi we provide comprehensive, but easy to handle tools for multiple imputation for hierarchical data sets

Read more

Summary

Introduction

Forty years after Donald Rubin’s seminal paper (Rubin, 1978) which introduced the concept of multiple imputation, the approach has been shown to be useful in many contexts going far beyond the classical item nonresponse in cross sectional surveys for which it was originally proposed (Reiter/Raghunathan, 2007). As discussed in Heitjan/Rubin (1991) coarse data are data for which the true values are not observed in a precise way This includes missing data as a special case, and rounding, grouping, censoring and interval data. It offers routines for imputing plausible values if it is only known (for some of the observations) that the exact value lies in certain intervals, for example if the data are censored Such imputation routines are only available in Stata. The package provides imputation routines for semi-continuous variables, that is, variables which have a spike at one value (typically zero), but can be considered continuous otherwise These imputation routines are available in several software packages, but are not offered in mice.

Multiple imputation for hierarchical data sets
Multilevel linear models
Multilevel generalized linear models
Dealing with missing values in hierarchical data
Multiple imputation using multilevel models
Existing imputation routines for hierarchical data and their limitations
Our contribution for the imputation of hierarchical data
Multiple imputation for interval data
Analyzing interval data
Methodology of multiple imputation for interval data
Our contribution for the imputation of interval data
Multiple imputation for data affected by heaping
Analyzing rounded data
Methodology of multiple imputation for data affected by heaping
Our contribution for the imputation of data affected by heaping
Software
Checks and preparations
Imputation cycles
The different supported types of variables
Pre-definition of the variable types
Output of hmi
Convergence checks
Pooling
Multilevel data
Before starting imputation
Running the imputation
Monitoring convergence
Analyzing the imputed data
Interval data
Some useful functions for interval data
Variables affected by heaping
Findings
Conclusion
Suggestion for rounding degrees

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.