Assessment of pulmonary embolism probability using a machine learning model

D V Gavrilov,A E Andreichenko,A V Gusev,T Yu Kuznetsova,A D Ermak

doi:10.15829/1560-4071-2024-5679

Abstract

Aim. To develop and validate a machine learning model designed to identify suspected pulmonary embolism (PE) based on various clinical features from electronic health records (EHRs) of out- and inpatients.Material and methods. Data from 19730 patients from 7 Russian regions were taken for analysis. EHR data were analyzed for the period from March 21, 2007 to February 4, 2022. Complaints, clinical and laboratory data, and concomitant diseases were used as diagnostic signs. PE was diagnosed in 1379 patients. Diagnosis of PE was based on ICD-10 codes. Seven machine learning algorithms were applied to diagnose pulmonary embolism: XGBoost, LightGBM, CatBoost, Logistic Regression, MLP Classifier, Random Forest Classifier, Gradient Boosting Classifier.Results. The Gradient Boosting Classifier-based model was selected for further prospective testing with the sensitivity of 0,899 (95% confidence interval (CI), 0,864-0,932), specificity of 0,875 (95% CI, 0,863-0,86), area under the ROC curve of 0,952 (95% CI, 0,938-0,964). The following signs had the greatest prediction value: cough, respiratory disorders, blood creatinine, body temperature, general weakness, heart rate, respiratory rate, edema, antihypertensive therapy, saturation and age.Conclusion. The model is designed for the initial encounter of patients with complaints and suspected PE, regardless of the type of care.

Full Text