Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.

Ziman Chen,Nonhlanhla Chambara,Chaoqun Wu,Xina Lo,Shirley Yuk Wah Liu,Simon Takadiyi Gunda,Xinyang Han,Jingguo Qu,Fei Chen,Michael Tin Cheung Ying

doi:10.1007/s12020-024-04066-x

Abstract

Large language models (LLMs) are pivotal in artificial intelligence, demonstrating advanced capabilities in natural language understanding and multimodal interactions, with significant potential in medical applications. This study explores the feasibility and efficacy of LLMs, specifically ChatGPT-4o and Claude 3-Opus, in classifying thyroid nodules using ultrasound images. This study included 112 patients with a total of 116 thyroid nodules, comprising 75 benign and 41 malignant cases. Ultrasound images of these nodules were analyzed using ChatGPT-4o and Claude 3-Opus to diagnose the benign or malignant nature of the nodules. An independent evaluation by a junior radiologist was also conducted. Diagnostic performance was assessed using Cohen's Kappa and receiver operating characteristic (ROC) curve analysis, referencing pathological diagnoses. ChatGPT-4o demonstrated poor agreement with pathological results (Kappa = 0.116), while Claude 3-Opus showed even lower agreement (Kappa = 0.034). The junior radiologist exhibited moderate agreement (Kappa = 0.450). ChatGPT-4o achieved an area under the ROC curve (AUC) of 57.0% (95% CI: 48.6-65.5%), slightly outperforming Claude 3-Opus (AUC of 52.0%, 95% CI: 43.2-60.9%). In contrast, the junior radiologist achieved a significantly higher AUC of 72.4% (95% CI: 63.7-81.1%). The unnecessary biopsy rates were 41.4% for ChatGPT-4o, 43.1% for Claude 3-Opus, and 12.1% for the junior radiologist. While LLMs such as ChatGPT-4o and Claude 3-Opus show promise for future applications in medical imaging, their current use in clinical diagnostics should be approached cautiously due to their limited accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.

Abstract

Talk to us

Similar Papers

More From: Endocrine

Lead the way for us

Journal: Endocrine	Publication Date: Oct 11, 2024
License type: CC BY 4.0

Similar Papers

Is Chinese Thyroid Imaging Reporting and Data Systems superior to American College of Radiology or American Thyroid Association guidelines for consistency and efficacy in the diagnosis of thyroid cancer?
Na Li ... Ruijao Chang
Chinese Medical Journal | VOL. 135
Na Li, et. al.Na Li ... Ruijao Chang
05 Aug 2022
Chinese Medical Journal | VOL. 135

Clinical Value of Shear Wave Elastography Color Scores in Classifying Thyroid Nodules.
Yan-Xia Zhang ... Hui-Zhan Li
International Journal of General Medicine | VOL. 14
Yan-Xia Zhang, et. al.Yan-Xia Zhang ... Hui-Zhan Li
01 Nov 2021
International Journal of General Medicine | VOL. 14

Ultrasonographic elastography of thyroid nodules: Is adding strain ratio to colour mapping better?
Y Chong ... B.-K Han
Clinical Radiology | VOL. 68
Y Chong, et. al.Y Chong ... B.-K Han
20 Aug 2013
Clinical Radiology | VOL. 68

Investigating the diagnostic efficiency of a computer-aided diagnosis system for thyroid nodules in the context of Hashimoto's thyroiditis.
Liu Gong ... Jia-Le Li
Frontiers in oncology | VOL. 12
Liu Gong, et. al.Liu Gong ... Jia-Le Li
05 Jan 2023
Frontiers in oncology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.

Abstract

Talk to us

Similar Papers

More From: Endocrine