Abstract

To predict the mortality of patients with coronavirus disease 2019 (COVID-19). We collected clinical data of COVID-19 patients between January 18 and March 29 2020 in Wuhan, China . Gradient boosting decision tree (GBDT), logistic regression (LR) model, and simplified LR were built to predict the mortality of COVID-19. We also evaluated different models by computing area under curve (AUC), accuracy, positive predictive value (PPV), and negative predictive value (NPV) under fivefold cross-validation. A total of 2924 patients were included in our evaluation, with 257 (8.8%) died and 2667 (91.2%) survived during hospitalization. Upon admission, there were 21 (0.7%) mild cases, 2051 (70.1%) moderate case, 779 (26.6%) severe cases, and 73 (2.5%) critically severe cases. The GBDT model exhibited the highest fivefold AUC, which was 0.941, followed by LR (0.928) and LR-5 (0.913). The diagnostic accuracies of GBDT, LR, and LR-5 were 0.889, 0.868, and 0.887, respectively. In particular, the GBDT model demonstrated the highest sensitivity (0.899) and specificity (0.889). The NPV of all three models exceeded 97%, while their PPV values were relatively low, resulting in 0.381 for LR, 0.402 for LR-5, and 0.432 for GBDT. Regarding severe and critically severe cases, the GBDT model also performed the best with a fivefold AUC of 0.918. In the external validation test of the LR-5 model using 72 cases of COVID-19 from Brunei, leukomonocyte (%) turned to show the highest fivefold AUC (0.917), followed by urea (0.867), age (0.826), and SPO2 (0.704). The findings confirm that the mortality prediction performance of the GBDT is better than the LR models in confirmed cases of COVID-19. The performance comparison seems independent of disease severity.Supplementary InformationThe online version contains supplementary material available at(10.1007/s00521-020-05592-1)

Highlights

  • Coronavirus Disease 2019 (COVID-19) is a new form of respiratory disorder caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]

  • Gradient boosting decision tree (GBDT), logistic regression (LR) model, and simplified LR were built to predict the mortality of COVID-19

  • The findings confirm that the mortality prediction performance of the gradient boosting decision tree (GBDT) is better than the LR models in confirmed cases of COVID-19

Read more

Summary

Introduction

Coronavirus Disease 2019 (COVID-19) is a new form of respiratory disorder caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. A host of existing studies target the prediction of COVID-19 disease progression [6], most prediction models for disease progression are single-center studies with small sample sizes (26–577 cases). These studies were developed with multivariable logistic regression [6,7,8,9], which may lead to an increased risk of overfitting. To bridge the gap between machine learning and COVID-19 prognosis, we propose to develop and evaluate a variety of relevant machine learning models for predicting the mortality of patients with COVID-19. Our experimental results show that all models are capable of achieving good performance in mortality prediction for confirmed COVID-19 patients. Our proposed LR-5 model exhibits superior performance in mortality prediction in comparison to GBDT and LR

Problem statement
GBDT modeling
Algorithm for generating regression tree
Study population and data sources
Data collection
Features extraction and selection
Baseline characteristics of patients
Comparisons of different models in the full cohort
Performance of models in COVID-19 patients with different disease severity
External validation test in 72 patients in Brunei
Discussion
Findings
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call