Abstract

Missing data occur in many types of studies and typically complicate the analysis. Multiple imputation, either using joint modeling or the more flexible fully conditional specification approach, are popular and work well in standard settings. In settings involving nonlinear associations or interactions, however, incompatibility of the imputation model with the analysis model is an issue often resulting in bias. Similarly, complex outcomes such as longitudinal or survival outcomes cannot be adequately handled by standard implementations. In this paper, we introduce the R package JointAI, which utilizes the Bayesian framework to perform simultaneous analysis and imputation in regression models with incomplete covariates. Using a fully Bayesian joint modeling approach it overcomes the issue of uncongeniality while retaining the attractive flexibility of fully conditional specification multiple imputation by specifying the joint distribution of analysis and imputation models as a sequence of univariate models that can be adapted to the type of variable. JointAI provides functions for Bayesian inference with generalized linear and generalized linear mixed models and extensions thereof as well as survival models and joint models for longitudinal and survival data, that take arguments analogous to the corresponding well known functions for the analysis of complete data from base R and other packages. Usage and features of JointAI are described and illustrated using various examples and the theoretical background is outlined.

Highlights

  • Missing data are a challenge common to the analysis of data from virtually all kinds of studies

  • We introduce the R package JointAI, which performs joint analysis and imputation of regression models with incomplete covariates under the missing at random (MAR) assumption (Rubin 1976), and explain how data with incomplete covariate information can be analyses and imputed with it

  • This is an important difference to standard fully conditional specification (FCS), where the full conditional distributions used to impute missing values are specified directly, usually as regression models, and require the outcome to be explicitly included into the linear predictor of the imputation model

Read more

Summary

Introduction

Missing data are a challenge common to the analysis of data from virtually all kinds of studies. The R package jomo (Quartagno and Carpenter 2020) performs joint model multiple imputation in the Bayesian framework using a multivariate normal distribution and includes an extension to the standard approach to assure compatibility between analysis model and imputation models. It can handle generalized linear (mixed) models, cumulative link mixed models, proportional odds probit regression and Cox proportional hazards models. We conclude the paper with an outlook of planned extensions and discuss the limitations that are introduced by the assumptions made in the fully Bayesian approach

Theoretical background
Analysis model
Imputation part
Prior distributions
Package structure
Example data
The NHANES data
The simLong data
The PBC data
Model specification
Specification of the model formula
Multi-level structure and longitudinal covariates
Survival models
Joint models
Covariate model types
Auxiliary variables
Reference values for categorical covariates
1: Mexican American 2: Other Hispanic 3: Non-Hispanic White 4
Hyper-parameters
Scaling
5.10. Shrinkage priors
5.11. JAGS model file
MCMC settings
Parameters to follow
Initial values
Parallel sampling
After fitting
Visualizing the posterior sample
Model Summary
Evaluation criteria
Subset of the MCMC sample
Predicted values
Export of imputed values
Assumptions and extensions
Density plot using ggplot2
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call