ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.

Yi-Hui Zhou,Ehsan Saghapour

doi:10.3389/fgene.2021.691274

Yi-Hui Zhou, Ehsan Saghapour

Open Access

https://doi.org/10.3389/fgene.2021.691274

Copy DOI

Journal: Frontiers in Genetics	Publication Date: Jul 2, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: North Carolina State University

Abstract

Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

Highlights

Hospitals in the United States have made a concerted effort to transition their health records from paper to digital, the proportion of which has dramatically increased, from 9.4% in 2008 to 75.5% in 2014 (Charles et al, 2013)
We evaluate the effectiveness of various imputation methods on Electronic health records (EHRs) and other real-world datasets, and proposed a practical and fast imputation method as a hybrid of existing methods
Our experiments show that both ImputeEHR1 and ImputeEHR2 can accelerate the imputation process 20–25 times faster than MissForest while achieving lower Root Mean Squared Error (RMSE)

Summary

INTRODUCTION

Hospitals in the United States have made a concerted effort to transition their health records from paper to digital, the proportion of which has dramatically increased, from 9.4% in 2008 to 75.5% in 2014 (Charles et al, 2013). The process of data imputation (artificially replacing missing data with an estimated value) offers a practical work-around so that many downstream data handling steps become feasible. This process preserves all observations by replacing missing data with an estimated value based on other available information. Our focus here is on the practical impact of imputation for downstream analysis, such as EHR-based prediction of important health measures. For such efforts, the emphasis is placed on the success of machine-learning methods, which themselves may involve penalization techniques and estimation known to be biased. We evaluate the effectiveness of various imputation methods on EHR and other real-world datasets, and proposed a practical and fast imputation method as a hybrid of existing methods

MIMIC-III

Datasets From the UCI Machine Learning Repository

METHODS

Imputing Missing Data

Testing Runtimes Between Methods

10. Update γ

WEB APPLICATION

Percentage of Missing Rate and Correlation Features Information

Visualization of Missingness Patterns

Imputation Algorithm

Visualization of the Important Features

Visualization of the Phenotype Prediction

CONCLUSIONS

Findings

DATA AVAILABILITY STATEMENT

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products.
Cynthia J Girman ... Mary E Ritchey
Pharmacoepidemiology and Drug Safety | VOL. 31
Cynthia J Girman, et. al.Cynthia J Girman ... Mary E Ritchey
03 May 2022
Pharmacoepidemiology and Drug Safety | VOL. 31

Continuity and Completeness of Electronic Health Record Data for Patients Treated With Oral Hypoglycemic Agents: Findings From Healthcare Delivery Systems in Taiwan.
Chien-Ning Hsu ... Sengwee Toh
Frontiers in Pharmacology | VOL. 13
Chien-Ning Hsu, et. al.Chien-Ning Hsu ... Sengwee Toh
04 Apr 2022
Frontiers in Pharmacology | VOL. 13

The application of unsupervised deep learning in predictive models using electronic health records
Lei Wang ... Tim Arnold
BMC Medical Research Methodology | VOL. 20
Lei Wang, et. al.Lei Wang ... Tim Arnold
26 Feb 2020
BMC Medical Research Methodology | VOL. 20

Data extraction from electronic health records (EHRs) for quality measurement of the physical therapy process: comparison between EHR data and survey data.
Marijn Scholte ... Philip J Van Der Wees
BMC Medical Informatics and Decision Making | VOL. 16
Marijn Scholte, et. al.Marijn Scholte ... Philip J Van Der Wees
08 Nov 2016
BMC Medical Informatics and Decision Making | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics