Abstract

PurposeThis study aimed to train and validate deep learning (DL) models for differentiating malignant from benign thyroid nodules on US images and compare their performance with that of radiologists. MethodsImages of thyroid nodules in patients who underwent US-guided fine-needle aspiration biopsy at our institution between January 2010 and March 2020 were retrospectively reviewed. Four radiologists independently classified the images. Images of thyroid nodules were trained using three different image classification DL models (VGG16, VGG19, and ResNet). The diagnostic performances of the DL models were calculated for the internal and external datasets and compared with the diagnoses of the four radiologists. Pairwise comparisons of the AUCs between the radiologists and DL models were made using bootstrap-based tests. ResultsIn total, 15,409 images from 7,321 patients (mean age, 60 ± 13 years; malignant nodules, 20.7%) were randomly grouped into training (n = 12,327) and validation (n = 3,082) sets. Independent internal (n = 432; 197 patients) and external (n = 168; 59 patients) test sets were also acquired. The DL models demonstrated a higher diagnostic performance than the radiologists in the internal test set (AUC, 0.83 – 0.86 vs. 0.71 – 0.76, P < 0.05), but not in the external test set. The VGG16 model demonstrated the highest diagnostic performance in internal (AUC, 0.86; sensitivity, 91.8%; specificity, 73.2%) and external (AUC: 0.83; sensitivity: 78.6%; specificity: 76.8%) test sets. However, no statistical differences were found in the AUCs among the DL models. ConclusionsThe DL models demonstrated comparable diagnostic performance to radiologists in distinguishing benign from malignant thyroid nodules on US images and may play a potential role in augmenting radiologists’ diagnosis of thyroid nodules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call