Abstract

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To investigate the issues of generalizability and replication of deep learning (DL) models by assessing performance of a screening mammography DL system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy and surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was DL system performance, measured with the area under the receiver operating characteristic curve (AUC), in classifying invasive breast cancer or ductal carcinoma in situ (n = 425) from no malignancy (n = 490) or benign lesions (n = 44) in age-matched controls. The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning; TL), after retraining with TL. Results The local test set comprised 959 individuals (mean age, 62.5 years [SD, 8.5]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95%CI = 0.82-0.84) and 0.89 (95%CI = 0.88-0.89), respectively. When applied in their original form to the local test set, the AUCs were 0.76 (95%CI = 0.73-0.79) and 0.84 (95%CI = 0.82-0.87), respectively. After local training without TL, the AUCs were 0.66 (95%CI = 0.62-0.69) and 0.86 (95%CI = 0.84-0.88). After retraining with TL, the AUCs were 0.82 (95%CI = 0.80-0.85) and 0.86 (95%CI = 0.84-0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied 'out of the box' to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. ©RSNA, 2024.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call