Abstract

Abstract Retrospective analyses of real-world clinical data face challenges owing to the absence of some data elements. Historically, missing data was addressed by first classifying its presence into one of three categories: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Imputation techniques continue to be developed and tested to gauge their capacity to mitigate the negative impact of missing data types on analyses and their results. This study undertook a comparison of two techniques of data imputation: probabilistic principal component analysis (PPCA) and multiple imputation using chained equations (MICE). Retrospective data from 41,543 unique patients including both medical and dental variables (n = 116) were mined from the institutional research data warehouse, which captures data through an integrated medical and dental electronic health record (iEHR). A subset with complete data on all variables of interest was sampled. “Missing data” were artificially created by randomly removing data elements to create the missing data problem. Applying PPCA and MICE, the capacity of the two techniques to create an accurate imputed dataset was tested. Comparisons were drawn between imputed dataset and sampled subset, to investigate which technique more closely simulated the true data. PPCA outperformed MICE with an overall correct imputation percentage (accuracy) and root mean square error (RMSE) of approximately 65% and 0.29, respectively, compared to MICE, which yielded approximately 38% accuracy with a RMSE of 0.83. Overall, this study concluded that PPCA demonstrated higher capacity to impute MCAR data than MICE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.