Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.

Andrew J Steele,Spiros C Denaxas,Harry Hemingway,Nicholas M Luscombe,Anoop D Shah,Tiratha Raj Singh

doi:10.1371/journal.pone.0202344

Abstract

Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.

Highlights

Advances in precision medicine will require increasingly individualised prognostic assessments for patients in order to guide appropriate therapy
We initially identified 115,305 patients with coronary disease in CALIBER and, after excluding patients based on criteria relating the timing of the diagnosis and follow-up, 82,197 patients remained in the cohort
We have demonstrated that machine learning approaches on routine Electronic health records (EHR) data can achieve comparable or better performance than expert-selected, imputed data in manually optimised models for risk prediction

Summary

Introduction

Advances in precision medicine will require increasingly individualised prognostic assessments for patients in order to guide appropriate therapy. Electronic health records (EHR) contain large amounts of information about patients’ medical history including symptoms, examination findings, test results, prescriptions and procedures. EHR are a rich data source, many of the data items are collected in a non-systematic manner according to clinical need, so missingness is often high [6, 7]. The population of patients with missing data may be systematically different depending on the reason that the data are missing [8]: tests may be omitted if the clinician judges they are not necessary [9], the patient refuses [10], or the patient fails to attend. Multiple imputation to handle missing data can be computationally intensive, and needs to include sufficient information about the reason for missingness to avoid bias [11]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Aug 31, 2018
Citations: 150	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Novel head and neck cancer survival analysis approach: Random survival forests versus cox proportional hazards regression
Frank R Datema ... Lars Willmes
Head & Neck Surgery | VOL. 34
Frank R Datema, et. al.Frank R Datema ... Lars Willmes
14 Feb 2011
Head & Neck Surgery | VOL. 34

Prediction of patient admission and readmission in adults from a Colombian cohort with bipolar disorder using artificial intelligence.
María Alejandra Palacios-Ariza ... Jorge Mcdouall
Frontiers in psychiatry | VOL. 14
María Alejandra Palacios-Ariza, et. al.María Alejandra Palacios-Ariza ... Jorge Mcdouall
21 Dec 2023
Frontiers in psychiatry | VOL. 14

Development of a Machine Learning-Based Model for Predicting the Incidence of Peripheral Intravenous Catheter-Associated Phlebitis.
Hideto Yasuda ... Takayuki Abe
The Journal of Critical Care Medicine | VOL. 10
Hideto Yasuda, et. al.Hideto Yasuda ... Takayuki Abe
01 Jul 2024
The Journal of Critical Care Medicine | VOL. 10

Dementia risk prediction in individuals with mild cognitive impairment: a comparison of Cox regression and machine learning models
Meng Wang ... Nils D Forkert
BMC Medical Research Methodology | VOL. 22
Meng Wang, et. al.Meng Wang ... Nils D Forkert
02 Nov 2022
BMC Medical Research Methodology | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one