Abstract
Abstract Multiple imputation is now well established as a practical and flexible method for analyzing partially observed data, particularly under the missing at random assumption. However, when the substantive model is a weighted analysis, there is concern about the empirical performance of Rubin’s rules and also about how to appropriately incorporate possible interaction between the weights and the distribution of the study variables. One approach that has been suggested is to include the weights in the imputation model, potentially also allowing for interactions with the other variables. We show that the theoretical criterion justifying this approach can be approximately satisfied if we stratify the weights to define level-two units in our data set and include random intercepts in the imputation model. Further, if we let the covariance matrix of the variables have a random distribution across the level-two units, we also allow imputation to reflect any interaction between weight strata and the distribution of the variables. We evaluate our proposal in a number of simulation scenarios, showing it has promising performance both in terms of coverage levels of the model parameters and bias of the associated Rubin’s variance estimates. We illustrate its application to a weighted analysis of factors predicting reception-year readiness in children in the UK Millennium Cohort Study.
Highlights
When collecting data for research, it is often the case that we are not able to obtain all the desired information for various reasons
While generally inference seems to be acceptable with most imputation methods, as indicated by negligible biases and good coverage levels, multiple imputation (MI)-xW, MLMI-Het, and MLMI-SMC are the best methods for variance estimation, as they are the methods for which model and empirical standard errors are most similar
MI-S seems to work well, as expected given that weight strata are large in this example (i.e., 600 observations per stratum), and within-stratum imputation is not as noisy as in the previous examples
Summary
When collecting data for research, it is often the case that we are not able to obtain all the desired information for various reasons (e.g., lack of resources, unwillingness to disclose information, loss to follow-up) Such missing data complicate the intended analysis, causing a loss of power and potentially biasing the results—when the reason for the missing data is associated with our scientific question. Fitting the substantive analysis model to such an imputed dataset gives the same weight to observed and imputed values; the latter are, at best, good guesses, and they should be somehow down-weighted Otherwise, such an approach will result in marked underestimation of the standard errors because of a failure to reflect uncertainty due to the missing values
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.