Improving and Externally Validating Mortality Prediction Models for COVID-19 Using Publicly Available Data

Avishek Chatterjee,Henry Woodruff,Philippe Lambin,Guus Wilmink

doi:10.3390/biomed2010002

Avishek Chatterjee, Henry Woodruff + Show 2 more

Open Access

https://doi.org/10.3390/biomed2010002

Copy DOI

Journal: BioMed	Publication Date: Jan 5, 2022
Citations: 3	License type: CC BY 4.0

Affiliation: Maastricht University

Abstract

We conducted a systematic survey of COVID-19 endpoint prediction literature to: (a) identify publications that include data that adhere to FAIR (findability, accessibility, interoperability, and reusability) principles and (b) develop and reuse mortality prediction models that best generalize to these datasets. The largest such cohort data we knew of was used for model development. The associated published prediction model was subjected to recursive feature elimination to find a minimal logistic regression model which had statistically and clinically indistinguishable predictive performance. This model could still not be applied to the four external validation sets that were identified, due to complete absence of needed model features in some external sets. Thus, a generalizable model (GM) was built which could be applied to all four external validation sets. An age-only model was used as a benchmark, as it is the simplest, effective, and robust predictor of mortality currently known in COVID-19 literature. While the GM surpassed the age-only model in three external cohorts, for the fourth external cohort, there was no statistically significant difference. This study underscores: (1) the paucity of FAIR data being shared by researchers despite the glut of COVID-19 prediction models and (2) the difficulty of creating any model that consistently outperforms an age-only model due to the cohort diversity of available datasets.

Highlights

We conducted a systematic survey of COVID-19 endpoint prediction literature to: (a) identify publications that include data that adhere to FAIR principles and (b) develop and reuse mortality prediction models that best generalize to these datasets
COVID-19 has a psychological impact, with various groups of people in society being at risk of developing anxiety or stress as a result of quarantine and, in the case of healthcare workers, a changed work dynamic [3]
Of the 168 articles summarized from the review paper of Wynants et al [6], 111 did not have any data availability statement