Deep learning models for thyroid nodules diagnosis of fine-needle aspiration biopsy: a retrospective, prospective, multicentre study in China

Jue Wang,Nafen Zheng,Huan Wan,Qinyue Yao,Shijun Jia,Xin Zhang,Sha Fu,Jingliang Ruan,Gui He,Xulin Chen,Suiping Li,Rui Chen,Boan Lai,Jin Wang,Qingping Jiang,Nengtai Ouyang,Yin Zhang

doi:10.1016/s2589-7500(24)00085-2

Abstract

Accurately distinguishing between malignant and benign thyroid nodules through fine-needle aspiration cytopathology is crucial for appropriate therapeutic intervention. However, cytopathologic diagnosis is time consuming and hindered by the shortage of experienced cytopathologists. Reliable assistive tools could improve cytopathologic diagnosis efficiency and accuracy. We aimed to develop and test an artificial intelligence (AI)-assistive system for thyroid cytopathologic diagnosis according to the Thyroid Bethesda Reporting System. 11 254 whole-slide images (WSIs) from 4037 patients were used to train deep learning models. Among the selected WSIs, cell level was manually annotated by cytopathologists according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) guidelines of the second edition (2017 version). A retrospective dataset of 5638 WSIs of 2914 patients from four medical centres was used for validation. 469 patients were recruited for the prospective study of the performance of AI models and their 537 thyroid nodule samples were used. Cohorts for training and validation were enrolled between Jan 1, 2016, and Aug 1, 2022, and the prospective dataset was recruited between Aug 1, 2022, and Jan 1, 2023. The performance of our AI models was estimated as the area under the receiver operating characteristic (AUROC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value. The primary outcomes were the prediction sensitivity and specificity of the model to assist cyto-diagnosis of thyroid nodules. The AUROC of TBSRTC III+ (which distinguishes benign from TBSRTC classes III, IV, V, and VI) was 0·930 (95% CI 0·921-0·939) for Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH) internal validation and 0·944 (0·929 - 0·959), 0·939 (0·924-0·955), 0·971 (0·938-1·000) for The First People's Hospital of Foshan (FPHF), Sichuan Cancer Hospital & Institute (SCHI), and The Third Affiliated Hospital of Guangzhou Medical University (TAHGMU) medical centres, respectively. The AUROC of TBSRTC V+ (which distinguishes benign from TBSRTC classes V and VI) was 0·990 (95% CI 0·986-0·995) for SYSMH internal validation and 0·988 (0·980-0·995), 0·965 (0·953-0·977), and 0·991 (0·972-1·000) for FPHF, SCHI, and TAHGMU medical centres, respectively. For the prospective study at SYSMH, the AUROC of TBSRTC III+ and TBSRTC V+ was 0·977 and 0·981, respectively. With the assistance of AI, the specificity of junior cytopathologists was boosted from 0·887 (95% CI 0·8440-0·922) to 0·993 (0·974-0·999) and the accuracy was improved from 0·877 (0·846-0·904) to 0·948 (0·926-0·965). 186 atypia of undetermined significance samples from 186 patients with BRAF mutation information were collected; 43 of them harbour the BRAFV600E mutation. 91% (39/43) of BRAFV600E-positive atypia of undetermined significance samples were identified as malignant by the AI models. In this study, we developed an AI-assisted model named the Thyroid Patch-Oriented WSI Ensemble Recognition (ThyroPower) system, which facilitates rapid and robust cyto-diagnosis of thyroid nodules, potentially enhancing the diagnostic capabilities of cytopathologists. Moreover, it serves as a potential solution to mitigate the scarcity of cytopathologists. Guangdong Science and Technology Department. For the Chinese translation of the abstract see Supplementary Materials section.

Full Text