Abstract

The purpose of this study is to establish a novel pulmonary embolism (PE) risk prediction model based on machine learning (ML) methods and to evaluate the predictive performance of the model and the contribution of variables to the predictive performance. We conducted a retrospective study at the Shanghai Tenth People's Hospital and collected the clinical data of in-patients that received pulmonary computed tomography imaging between January 1, 2014 and December 31, 2018. We trained several ML models, including logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting decision tree (GBDT), compared the models with representative baseline algorithms, and investigated their predictability and feature interpretation. A total of 3619 patients were included in the study. We discovered that the GBDT model demonstrated the best prediction with an area under the curve value of 0.799, whereas those of the RF, LR, and SVM models were 0.791, 0.716, and 0.743, respectively. The sensibilities of the GBDT, LR, RF, and SVM models were 63.9%, 68.1%, 71.5%, and 75%, respectively; the specificities were 81.1%, 66.1, 72.7%, and 65.1%, respectively; and the accuracies were 77.8%, 66.5%, 72.5%, and 67%, respectively. We discovered that the maximum D-dimer level contributed the most to the outcome prediction, followed by the extreme growth rate of the plasma fibrinogen level, in-hospital duration, and extreme growth rate of the D-dimer level. The study demonstrates the superiority of the GBDT model in predicting the risk of PE in hospitalized patients. However, in order to be applied in clinical practice and provide support for clinical decision-making, the predictive performance of the model needs to be prospectively verified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call