Abstract

To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals. Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets. ICUs across Europe and the United States. Adult patients admitted to the ICU for at least 6 hours who had good data quality. None. Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments. Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call