Abstract
Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue.
Highlights
Missing data are frequently encountered in biomedical research
An imputation model is said to be semi-compatible if the analysis model is embedded in it
The results showed that if we develop a rich imputation model, the fitting of such model becomes problematic with increasing number of predictor e.g., for p = 200 & 500 with a sample size n = 100
Summary
The statistical analysis of the data often demands complete cases without any missing values. The analysis without appropriate handling of missing values may lead to biased inferences. A variety of statistical methods is available for addressing the missing data issue. Multiple imputation (MI) [1,2,3] has become the most popular approach for handling missing data in practice. MI fills each missing value with more than one plausible value drawn from its predictive distribution given the observed data. MI formally comprises three stages: imputation, analysis, and combining results of the analysis. M independent imputed values are obtained corresponding to each missing value to get M complete imputed datasets. Each of the M imputed datasets is analyzed using standard statistical techniques for complete
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have