The application of unsupervised deep learning in predictive models using electronic health records

Lei Wang,Tim Arnold,Tina Esposito,Darcy Davis,Liping Tong

doi:10.1186/s12874-020-00923-1

Abstract

BackgroundThe main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks.MethodsWe compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals.ResultsOn simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper.ConclusionsWe conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.

Highlights

The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling
Figures of Autoencoder and Random Forest are closely matched by the numbers of Enhanced Reg and least absolute shrinkage and selection operator (LASSO), which is consistent with the finding in [29] that performances for well-established predictive models tend to be similar when sample size is large
positive predictive value (PPV) and Area under the receiver operating characteristic curve (AUC) of Enhanced Reg remain roughly unchanged in the existence of categorization and missing data, and stand at 24.89, 21.25%, 0.756 in scenario 4, respectively

Summary

Introduction

The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. The dramatic increase of EHR (Electronic Health Record) data provides many novel opportunities to capture the association between patient outcomes and clinical treatments, while pushing the dimensionality and complexity of data to a state where some classical predictive models may fail. Wang et al BMC Medical Research Methodology (2020) 20:37 variable selection. Machine learning procedures such as Random Forest [7] have been successfully implemented in various practical problems. Operating on the divide and conquer principle, Random Forest exhibits remarkably good results by averaging the results obtained from a predefined number of randomized individual decision trees while requiring very little tuning [8]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Research Methodology	Publication Date: Feb 26, 2020
Citations: 19	License type: open-access

R Discovery Prime

R Discovery Prime

The application of unsupervised deep learning in predictive models using electronic health records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology

Lead the way for us

Similar Papers

Augmenting machine learning algorithms to predict mortality using patient-reported outcomes in oncology.
...
Journal of Clinical Oncology | VOL. 39
, et. al. ...
20 May 2021
Journal of Clinical Oncology | VOL. 39

Abstract 15999: Machine Learning to Identify Patients at High Risk for Peripheral Arterial Disease From Electronic Health Record Data
Mark Sonderman ... Eric Farber-Eger
Circulation | VOL. 142
Mark Sonderman, et. al.Mark Sonderman ... Eric Farber-Eger
17 Nov 2020
Circulation | VOL. 142

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project.
Teresa B Gibson ... Sai Dharmarajan
Journal of the American Medical Informatics Association | VOL. 28
Teresa B Gibson, et. al.Teresa B Gibson ... Sai Dharmarajan
13 Mar 2021
Journal of the American Medical Informatics Association | VOL. 28

Lossless integration of multiple electronic health records for identifying pleiotropy using summary statistics
Ruowang Li ... Digna R Velez Edwards
Nature Communications | VOL. 12
Ruowang Li, et. al.Ruowang Li ... Digna R Velez Edwards
08 Jan 2021
Nature Communications | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The application of unsupervised deep learning in predictive models using electronic health records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology