Abstract

MotivationCombination of multiple datasets is routine in modern epidemiology. However, studies may have measured different sets of variables; this is often inefficiently dealt with by excluding studies or dropping variables. Multilevel multiple imputation methods to impute these ‘systematically’ missing data (as opposed to ‘sporadically’ missing data within a study) are available, but problems may arise when many random effects are needed to allow for heterogeneity across studies. We show that the Bayesian IMputation and Analysis Model (BIMAM) implemented in our tool works well in this situation.General featuresBIMAM performs imputation and analysis simultaneously. It imputes both binary and continuous systematically and sporadically missing data, and analyses binary and continuous outcomes. BIMAM is a user-friendly, freely available tool that does not require knowledge of Bayesian methods. BIMAM is an R Shiny application. It is downloadable to a local machine and it automatically installs the required freely available packages (R packages, including R2MultiBUGS and MultiBUGS).AvailabilityBIMAM is available at [www.alecstudy.org/bimam].

Highlights

  • The European Community Respiratory Health Survey (ECRHS) study, which is the source of the example dataset used in this paper, was performed with the approval of the corresponding local/regional committees for all participating centres, and with written informed consent obtained from all participants.In collaborative epidemiological projects that combine information across multiple datasets to estimate the associations of risk factors with a disease trait or find its best set of predictors, a major issue is how to deal with studies that have measured different sets of variables

  • Whereas these methods are based on fully conditional specification (FCS) of the imputation model, where a conditional distribution is defined for each missing variable, others have been developed based on joint modelling (JM), where a multivariate joint distribution is specified for all variables in the imputation model.[7]

  • Bayesian methods tend to perform better than classical methods in this situation, but such advantage may be limited if the Bayesian framework is only used for the imputation and not for the analysis model, such as in the Bayesian imputation approaches reviewed by Audigier et al.[8]

Read more

Summary

Introduction

The ECRHS study, which is the source of the example dataset used in this paper, was performed with the approval of the corresponding local/regional committees for all participating centres, and with written informed consent obtained from all participants.In collaborative epidemiological projects that combine information across multiple datasets to estimate the associations of risk factors with a disease trait or find its best set of predictors, a major issue is how to deal with studies that have measured different sets of variables. When pooling data from different populations, or from studies with different methods, there is often heterogeneity across datasets in the size of the association of both the risk factors with the outcome (analysis model), and the predictors with the missing variable (imputation model).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call