Abstract
BackgroundThe main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks.MethodsWe compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals.ResultsOn simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper.ConclusionsWe conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.
Highlights
The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling
Figures of Autoencoder and Random Forest are closely matched by the numbers of Enhanced Reg and least absolute shrinkage and selection operator (LASSO), which is consistent with the finding in [29] that performances for well-established predictive models tend to be similar when sample size is large
positive predictive value (PPV) and Area under the receiver operating characteristic curve (AUC) of Enhanced Reg remain roughly unchanged in the existence of categorization and missing data, and stand at 24.89, 21.25%, 0.756 in scenario 4, respectively
Summary
The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. The dramatic increase of EHR (Electronic Health Record) data provides many novel opportunities to capture the association between patient outcomes and clinical treatments, while pushing the dimensionality and complexity of data to a state where some classical predictive models may fail. Wang et al BMC Medical Research Methodology (2020) 20:37 variable selection. Machine learning procedures such as Random Forest [7] have been successfully implemented in various practical problems. Operating on the divide and conquer principle, Random Forest exhibits remarkably good results by averaging the results obtained from a predefined number of randomized individual decision trees while requiring very little tuning [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.