Abstract

Objective To evaluate the effect of expectation maximization with bootstrapping (EMB) in multiple imputation of quantitative variables for cross-sectional health examination data and to provide evidences for choosing appropriate multiple imputation method for health examination data. Methods We collected data on 1 634 people taking routine physical examination at Xijing Hospital Health Checkup Center in Xi′an, Shaanxi province from January to December 2013. The data were analyzed with Amelia II package in R 3.5.0 statistical software and EMB multiple imputation method was used to fill missing values in the data set. Results The estimated errors of the multiple imputations with EMB were decreased compared to those with list deletion method for the data set with the missing rate of less than 10%, 20%, or 70% for univariate quantitative variables. The effect of the EMB multiple imputation differed by the time of the imputation process and the appropriate imputation time for the used data set was 10. The probability density distribution curves for the data set before and after the imputation demonstrated that the imputed values were in a good agreement with the observed values when 10 imputations completed; the over-fitting diagnostic plot further revealed that the majority of the 90% confidence intervals for most observations of each variable contained the best fit line, with the narrow ranges for the confidence intervals. Different variables were included in the multivariate logistic regression models constructed for the same data set processed with multiple imputation with list deletion and the EMB method. Conclusion For quantitative variables with different random missing rate, the effect of EMB based multiple imputation is better than that of list deletion method and the optimal imputation times vary for data sets with different missing profile. 【摘 要】 目的 研究基于 bootstrap 抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。 方法 基于人群横断面健康体检实测数据,采用 EMB 法多重填补法,应用 R 3.5.0 统计软件中的 Amelia II 程序包对 2013 年 1 — 12 月在陕西省西安市西京医院健康体检中心进行常规体检的 1 634 名员工的健康体检数据进行多重填补分析。 结果 对于横断面定量健康体检资料,在单变量缺失率分别为 < 10 %、20 %和 70 % 3 种随机缺失情况下,EMB 多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB 多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为 m = 10 次;填补前后概率密度曲线分布图显示,填补次数 m = 10 时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数 m = 10 时各变量大多数观测值的 90 % CI 包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和 EMB 多重填补法处理后的 2 个不同分析数据集分别构建的多因素回归模型中包含的变量不同。 结论 对于不同缺失率随机缺失的定量变量,EMB 多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call