Budget constrained machine learning for early prediction of adverse outcomes for COVID-19 patients

Sam Nguyen,Deepa Mukundan,Joan M Duggan,Jennifer A Hanrahan,Mark Work,Lucas Womack,Paul Kiszka,Ryan Chan,Jose Cadena,Steven T Haller,Braden Soper,David J Kennedy,Priyadip Ray

doi:10.1038/s41598-021-98071-z

Abstract

The combination of machine learning (ML) and electronic health records (EHR) data may be able to improve outcomes of hospitalized COVID-19 patients through improved risk stratification and patient outcome prediction. However, in resource constrained environments the clinical utility of such data-driven predictive tools may be limited by the cost or unavailability of certain laboratory tests. We leveraged EHR data to develop an ML-based tool for predicting adverse outcomes that optimizes clinical utility under a given cost structure. We further gained insights into the decision-making process of the ML models through an explainable AI tool. This cohort study was performed using deidentified EHR data from COVID-19 patients from ProMedica Health System in northwest Ohio and southeastern Michigan. We tested the performance of various ML approaches for predicting either increasing ventilatory support or mortality. We performed post hoc analysis to obtain optimal feature sets under various budget constraints. We demonstrate that it is possible to achieve a significant reduction in cost at the expense of a small reduction in predictive performance. For example, when predicting ventilation, it is possible to achieve a 43% reduction in cost with only a 3% reduction in performance. Similarly, when predicting mortality, it is possible to achieve a 50% reduction in cost with only a 1% reduction in performance. This study presents a quick, accurate, and cost-effective method to evaluate risk of deterioration for patients with SARS-CoV-2 infection at the time of clinical evaluation.

Highlights

The combination of machine learning (ML) and electronic health records (EHR) data may be able to improve outcomes of hospitalized COVID-19 patients through improved risk stratification and patient outcome prediction
With the increasing availability of electronic health records (EHRs) of hospitalized COVID-19 patients, data-driven decision support systems, such as those based on Machine Learning (ML) methodologies, have been explored extensively in the recent literature as a means of triaging patients with COVID-19 at the point of contact with the health care system[3,4,5,6]
There exists a trade-off between an ML model’s predictive performance and interpretability: Linear models are highly interpretable, but they may not have enough capacity to capture the complexity of EHR data, whereas non-linear models typically provide better predictive performance, but they can be hard to interpret

Summary

Introduction

The combination of machine learning (ML) and electronic health records (EHR) data may be able to improve outcomes of hospitalized COVID-19 patients through improved risk stratification and patient outcome prediction. While recent studies have considered the interpretability of ML algorithms that triage COVID-19 patients based on clinical features, the availability and cost of such clinical features have largely been ignored This is an important consideration, since many hospitals reached near full capacity at the peak of the pandemic, bringing the economic sustainability and ethicality of resource allocation of the healthcare system into question. Recent studies have found that patients are often over-diagnosed by unnecessary testing services, which may delay care for those patients who have more immediate need for medical attention This suggests that taking the cost of diagnostic testing into account when building ML decision support tools can help satisfy budget constraints in resource-constrained environments, but it can lead to better patient-centered outcomes

Methods

Results

Discussion

Conclusion