Accurate preoperative prediction of cervical lymph node metastasis (LNM) for papillary thyroid carcinoma (PTC) patients is essential for disease staging and individualized treatment planning, which can improve prognosis and facilitate better management. To establish a fully automated deep learning-enabled model (FADLM) for automated tumor segmentation and cervical LNM prediction in PTC using ultrasound (US) video keyframes. The bicentral study retrospective enrolled 518 PTC patients, who were then randomly divided into the training (Hospital 1, n=340), internal test (Hospital 1, n=83), and external test cohorts (Hospital 2, n=95). The FADLM integrated mask region-based convolutional neural network (Mask R-CNN) for automatic thyroid primary tumor segmentation and ResNet34 with Bayes strategy for cervical LNM diagnosis. A radiomics model (RM) using the same automated segmentation method, a traditional radiomics model (TRM) using manual segmentation, and a clinical-semantic model (CSM) were developed for comparison. The dice similarity coefficient (DSC) was used to evaluate segmentation performance. The prediction performance of the models was validated in terms of discrimination and clinical utility with the area under the receiver operator characteristic curve (AUC), heatmap analysis, and decision curve analysis (DCA). The comparison of the predictive performance among different models was conducted by DeLong test. The performances of two radiologists compared with FADLM and the diagnostic augmentation with FADLM's assistance were analyzed in terms of accuracy, sensitivity and specificity using McNemar's x2 test. The p-value less than 0.05 was defined as a statistically significant difference. The Benjamini-Hochberg procedure was applied for multiple comparisons to deal with Type I error. The FADLM yielded promising segmentation results in training (DSC: 0.88±0.23), internal test (DSC: 0.88±0.23), and external test cohorts (DSC: 0.85±0.24). The AUCs of FADLM for cervical LNM prediction were 0.78 (95% CI: 0.73, 0.83), 0.83 (95% CI: 0.74, 0.92), and 0.83 (95% CI: 0.75, 0.92), respectively. It all significantly outperformed the RM (AUCs: 0.78vs. 0.72; 0.83vs. 0.65; 0.83vs. 0.68, all adjusted p-values<0.05) and CSM (AUCs: 0.78vs. 0.71; 0.83vs. 0.62; 0.83vs. 0.68, all adjusted p-values<0.05) across the three cohorts. The RM offered similar performance to that of TRM (AUCs: 0.61vs. 0.63, adjusted p-value =0.60) while significantly reducing the segmentation time (3.3±3.8 vs. 14.1±4.2 s, p-value <0.001). Under the assistance of FADLM, the accuracies of junior and senior radiologists were improved by 18% and 15% (all adjusted p-values<0.05) and the sensitivities by 25% and 21% (all adjusted p-values<0.05) in the external test cohort. The FADLM with elaborately designed automated strategy using US video keyframes holds good potential to provide an efficient and consistent prediction of cervical LNM in PTC. The FADLM displays superior performance to RM, CSM, and radiologists with promising efficacy.
Read full abstract