Development of a machine learning-based fine-grained risk stratification system for thyroid nodules using predefined clinicoradiological features.

Eun Ju Ha,Jeong Hoon Lee,Dong Gyu Na,Da Hyun Lee,Ji-Hoon Kim

doi:10.1007/s00330-022-09376-0

Abstract

We constructed and validated a machine learning-based malignancy risk estimation model using predefined clinicoradiological features, and evaluated its clinical utility for the management of thyroid nodules. In total, 5708 benign (n = 4597) and malignant (n = 1111) thyroid nodules were collected from 5081 consecutive patients treated in 26 institutions. Seventeen experienced radiologists evaluated nodule characteristics on ultrasonographic images. Eight predictive models were used to stratify the thyroid nodules according to malignancy risk; model performance was assessed via nested 10-fold cross-validation. The best-performing algorithm was externally validated using data for 454 thyroid nodules from a tertiary hospital, then compared to the Thyroid Imaging Reporting and Data System (TIRADS)-based interpretations of radiologists (American College of Radiology, European and Korean TIRADS, and AACE/ACE/AME guidelines). The area under the receiver operating characteristic (AUROC) curves of the algorithms ranged from 0.773 to 0.862. The sensitivities, specificities, positive predictive values, and negative predictive values of the best-performing models were 74.1-76.6%, 80.9-83.4%, 49.2-51.9%, and 93.0-93.5%, respectively. For the external validation set, the ElasticNet values were 83.2%, 89.2%, 81.8%, and 90.1%, respectively. The corresponding TIRADS values were 66.5-85.0%, 61.3-80.8%, 45.9-72.1%, and 81.5-90.3%, respectively. The new model exhibited a significantly higher AUROC and specificity than did the TIRADS risk stratification, although its sensitivity was similar. We developed a reliable machine learning-based predictive model that demonstrated enhanced specificity when stratifying thyroid nodules according to malignancy risk. This system will contribute to improved personalized management of thyroid nodules. • The area under the receiver operating characteristic (AUROC) curve, sensitivity, and specificity of our model were 0.914, 83.2%, and 89.2%, respectively (derived using the validation dataset). • Compared to the TIRADS values, the AUROC and specificity are significantly higher, while the sensitivity is similar. • An interactive version of our AI algorithm is at http://tirads.cdss.co.kr .

Full Text