Abstract
Missing data occur in many types of studies and typically complicate the analysis. Multiple imputation, either using joint modeling or the more flexible fully conditional specification approach, are popular and work well in standard settings. In settings involving nonlinear associations or interactions, however, incompatibility of the imputation model with the analysis model is an issue often resulting in bias. Similarly, complex outcomes such as longitudinal or survival outcomes cannot be adequately handled by standard implementations. In this paper, we introduce the R package JointAI, which utilizes the Bayesian framework to perform simultaneous analysis and imputation in regression models with incomplete covariates. Using a fully Bayesian joint modeling approach it overcomes the issue of uncongeniality while retaining the attractive flexibility of fully conditional specification multiple imputation by specifying the joint distribution of analysis and imputation models as a sequence of univariate models that can be adapted to the type of variable. JointAI provides functions for Bayesian inference with generalized linear and generalized linear mixed models and extensions thereof as well as survival models and joint models for longitudinal and survival data, that take arguments analogous to the corresponding well known functions for the analysis of complete data from base R and other packages. Usage and features of JointAI are described and illustrated using various examples and the theoretical background is outlined.
Highlights
Missing data are a challenge common to the analysis of data from virtually all kinds of studies
We introduce the R package JointAI, which performs joint analysis and imputation of regression models with incomplete covariates under the missing at random (MAR) assumption (Rubin 1976), and explain how data with incomplete covariate information can be analyses and imputed with it
This is an important difference to standard fully conditional specification (FCS), where the full conditional distributions used to impute missing values are specified directly, usually as regression models, and require the outcome to be explicitly included into the linear predictor of the imputation model
Summary
Missing data are a challenge common to the analysis of data from virtually all kinds of studies. The R package jomo (Quartagno and Carpenter 2020) performs joint model multiple imputation in the Bayesian framework using a multivariate normal distribution and includes an extension to the standard approach to assure compatibility between analysis model and imputation models. It can handle generalized linear (mixed) models, cumulative link mixed models, proportional odds probit regression and Cox proportional hazards models. We conclude the paper with an outlook of planned extensions and discuss the limitations that are introduced by the assumptions made in the fully Bayesian approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have