Generalizability issues with deep learning models in medicine and their potential solutions: illustrated with cone-beam computed tomography (CBCT) to computed tomography (CT) image conversion

Xiao Liang,Steve B Jiang,Dan Nguyen

doi:10.1088/2632-2153/abb214

Xiao Liang, Steve B Jiang + Show 1 more

Open Access

https://doi.org/10.1088/2632-2153/abb214

Copy DOI

Abstract

Generalizability is a concern when applying a deep learning (DL) model trained on one dataset to other datasets. It is challenging to demonstrate a DL model’s generalizability efficiently and sufficiently before implementing the model in clinical practice. Training a universal model that works anywhere, anytime, for anybody is unrealistic. In this work, we demonstrate the generalizability problem, then explore potential solutions based on transfer learning by using the cone-beam computed tomography (CBCT) to computed tomography (CT) image conversion task as the testbed. Previous works only studied on one or two anatomical sites and used images from the same vendor’s scanners. Here, we investigated how a model trained for one machine and one anatomical site works on other machines and other anatomical sites. We trained a model on CBCT images acquired from one vendor’s scanners for head and neck cancer patients and applied it to images from another vendor’s scanners and for prostate, pancreatic, and cervical cancer patients. We found that generalizability could be a significant problem for this particular application when applying a trained DL model to datasets from another vendor’s scanners. We then explored three practical solutions based on transfer learning to solve this generalization problem: the target model, which is trained on a target dataset from scratch; the combined model, which is trained on both source and target datasets from scratch; and the adapted model, which fine-tunes the trained source model to a target dataset. We found that when there are sufficient data in the target dataset, all three models can achieve good performance. When the target dataset is limited, the adapted model works the best, which indicates that using the fine-tuning strategy to adapt the trained model to an unseen target dataset is a viable and easy way to implement DL models in the clinic.

Highlights

Deep learning (DL) has been increasingly applied in medicine because it can improve the accuracy of diagnosis, prognosis, and treatment decision making by retrieving hidden information from big clinical data, improve efficiency by automating or augmenting clinical procedures, and transfer expertise to less experienced clinicians by learning from experienced clinicians
The target model is trained on a target dataset starting from scratch
The generated synthetic CT (sCT) images and their corresponding cone-beam computed tomography (CBCT) and deformed CT (dCT) images from the H&N1 and H&N2 testing datasets are shown in Supplementary Figures 1 and 2 for visual evaluation

Summary

Introduction

Deep learning (DL) has been increasingly applied in medicine because it can improve the accuracy of diagnosis, prognosis, and treatment decision making by retrieving hidden information from big clinical data, improve efficiency by automating or augmenting clinical procedures, and transfer expertise to less experienced clinicians by learning from experienced clinicians. A better practice is to test the model with an external dataset, as some journals have recently started requiring for published DL research (David et al, 2020). To address the problem of model generalizability, many researchers try to collect as much and as diverse patient data as possible to train a DL model that works in any clinical scenario, anytime, anywhere, for anybody. This ambitious goal seems unrealistic, as it is very challenging, if not impossible, to collect patient data from enough medical institutions to represent all clinical scenarios

Methods

Results

Conclusion