Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study.

Masao Iwagami,Ryota Inokuchi,Eiryo Kawakami,Tomohide Yamada,Atsushi Goto,Toshiki Kuno,Yohei Hashimoto,Nobuaki Michihata,Tadahiro Goto,Tomohiro Shinozaki,Yu Sun,Yuta Taniguchi,Jun Komiyama,Kazuaki Uda,Toshikazu Abe,Nanako Tamiya

doi:10.1371/journal.pdig.0000578

Abstract

It is expected but unknown whether machine-learning models can outperform regression models, such as a logistic regression (LR) model, especially when the number and types of predictor variables increase in electronic health records (EHRs). We aimed to compare the predictive performance of gradient-boosted decision tree (GBDT), random forest (RF), deep neural network (DNN), and LR with the least absolute shrinkage and selection operator (LR-LASSO) for unplanned readmission. We used EHRs of patients discharged alive from 38 hospitals in 2015-2017 for derivation and in 2018 for validation, including basic characteristics, diagnosis, surgery, procedure, and drug codes, and blood-test results. The outcome was 30-day unplanned readmission. We created six patterns of data tables having different numbers of binary variables (that ≥5% or ≥1% of patients or ≥10 patients had) with and without blood-test results. For each pattern of data tables, we used the derivation data to establish the machine-learning and LR models, and used the validation data to evaluate the performance of each model. The incidence of outcome was 6.8% (23,108/339,513 discharges) and 6.4% (7,507/118,074 discharges) in the derivation and validation datasets, respectively. For the first data table with the smallest number of variables (102 variables that ≥5% of patients had, without blood-test results), the c-statistic was highest for GBDT (0.740), followed by RF (0.734), LR-LASSO (0.720), and DNN (0.664). For the last data table with the largest number of variables (1543 variables that ≥10 patients had, including blood-test results), the c-statistic was highest for GBDT (0.764), followed by LR-LASSO (0.755), RF (0.751), and DNN (0.720), suggesting that the difference between GBDT and LR-LASSO was small and their 95% confidence intervals overlapped. In conclusion, GBDT generally outperformed LR-LASSO to predict unplanned readmission, but the difference of c-statistic became smaller as the number of variables was increased and blood-test results were used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study.

Abstract

Talk to us

Similar Papers

More From: PLOS digital health

Lead the way for us

Journal: PLOS digital health	Publication Date: Aug 20, 2024
License type: CC BY 4.0

Similar Papers

Methodological progress note: Machine learning methods in healthcare research.
Colin Rogerson ... Matt Hall
Journal of Hospital Medicine | VOL. 18
Colin Rogerson, et. al.Colin Rogerson ... Matt Hall
13 Mar 2023
Journal of Hospital Medicine | VOL. 18

Individualized prediction of chronic kidney disease for the elderly in longevity areas in China: Machine learning approaches.
Dai Su ... Xingyu Zhang
Frontiers in public health | VOL. 10
Dai Su, et. al.Dai Su ... Xingyu Zhang
21 Oct 2022
Frontiers in public health | VOL. 10

Emergency department triage prediction of clinical outcomes using machine learning models
Yoshihiko Raita ... Mohammad Kamal Faridi
Critical Care | VOL. 23
Yoshihiko Raita, et. al.Yoshihiko Raita ... Mohammad Kamal Faridi
22 Feb 2019
Critical Care | VOL. 23

Prediction of Medical Disputes Between Health Care Workers and Patients in Terms of Hospital Legal Construction Using Machine Learning Techniques: Externally Validated Cross-Sectional Study.
Min Yi ... Liangyu Wei
Journal of medical Internet research | VOL. 25
Min Yi, et. al.Min Yi ... Liangyu Wei
17 Aug 2023
Journal of medical Internet research | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study.

Abstract

Talk to us

Similar Papers

More From: PLOS digital health