A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data

Matteo Chieregato,Stefania Nici,Chiara Bassetti,Mauro Morassi,Fabio Frangiamore,Claudia Baresi,Marco Galelli,Claudio Bnà

doi:10.1038/s41598-022-07890-1

Matteo Chieregato, Stefania Nici + Show 6 more

Open Access

https://doi.org/10.1038/s41598-022-07890-1

Copy DOI

Abstract

COVID-19 clinical presentation and prognosis are highly variable, ranging from asymptomatic and paucisymptomatic cases to acute respiratory distress syndrome and multi-organ involvement. We developed a hybrid machine learning/deep learning model to classify patients in two outcome categories, non-ICU and ICU (intensive care admission or death), using 558 patients admitted in a northern Italy hospital in February/May of 2020. A fully 3D patient-level CNN classifier on baseline CT images is used as feature extractor. Features extracted, alongside with laboratory and clinical data, are fed for selection in a Boruta algorithm with SHAP game theoretical values. A classifier is built on the reduced feature space using CatBoost gradient boosting algorithm and reaching a probabilistic AUC of 0.949 on holdout test set. The model aims to provide clinical decision support to medical doctors, with the probability score of belonging to an outcome class and with case-based SHAP interpretation of features importance.

Highlights

To date (May 2021), more than one hundred millions of individuals have been reported as affected by COVID-19
Radiological information is native as imaging data, while laboratory and clinical information comes in tabular form
We built a COVID-19 prognostic hybrid machine-learning/deep learning model intended to be usable as a tool that can support clinical decision making

Summary

Introduction

To date (May 2021), more than one hundred millions of individuals have been reported as affected by COVID-19. From the beginning of the infection, it was apparent that COVID-19 encompasses a wide spectrum of both clinical presentations and consequent prognosis, with cases of sudden, unexpected evolution (and worsening) of the clinical and radiological picture[1]. Such elements of variability and instability are still not fully explained, with an important role advocated for a multiplicity of pathophysiological p rocesses[2–4]. The Shapley values are a fair distribution of the payout between players, i.e. of the prediction result between features In this way, both synthetic (percentage score) and analytic (SHAP values) information are provided to the judgement of the clinician

Methods

Results

Discussion

Conclusion