Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators.

Eun Young Jeong,Miran Han,Yoon Joo Cho,Seon Young Park,Hye Lin Kim,Eun Ju Ha

doi:10.1007/s00330-018-5772-9

Abstract

To evaluate the diagnostic performance and reproducibility of a computer-aided diagnosis (CAD) system for thyroid cancer diagnosis using ultrasonography (US) based on the operator's experience. Between July 2016 and October 2016, 76 consecutive patients with 100 thyroid nodules (≥ 1.0cm) were prospectively included. An experienced radiologist performed the US examinations with a real-time CAD system integrated into the US machine, and three operators with different levels of US experience (0-5years) independently applied the CAD system. We compared the diagnostic performance of the CAD system based on the operators' experience and calculated the interobserver agreement for cancer diagnosis and in terms of each US descriptor. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of the CAD system were 88.6, 83.9, 81.3, 90.4, and 86.0%, respectively. The sensitivity and accuracy of the CAD system were not significantly different from those of the radiologist (p >0.05), while the specificity was higher for the experienced radiologist (p =0.016). For the less-experienced operators, the sensitivity was 68.8-73.8%, specificity 74.1-88.5%, PPV 68.9-73.3%, NPV 72.7-80.0%, and accuracy 71.0-75.0%. The less-experienced operators showed lower sensitivity and accuracy than those for the experienced radiologist. The interobserver agreement was substantial for the final diagnosis and each US descriptor, and moderate for the margin and composition. The CAD system may have a potential role in the thyroid cancer diagnosis. However, operator dependency still remains and needs improvement. • The sensitivity and accuracy of the CAD system did not differ significantly from those of the experienced radiologist (88.6% vs. 84.1%, p = 0.687; 86.0% vs. 91.0%, p = 0.267) while the specificity was significantly higher for the experienced radiologist (83.9% vs. 96.4%, p = 0.016). • However, the diagnostic performance varied according to the operator's experience (sensitivity 70.5-88.6%, accuracy 72.0-86.0%) and they were lower for the less-experienced operators than for the experienced radiologist. • The interobserver agreement was substantial for the final diagnosis and each US descriptor and moderate for the margin and composition.

Full Text