Comparative Analysis of Large Language Models' Performance in Breast Imaging

Muhammed Said Beşler

doi:10.18663/tjcl.1561361

Abstract

Aim: To evaluate the performance of the flagship models, OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, in breast imaging cases. Materials and Methods: The dataset consisted of cases from the publicly available Case of the Month archive by the Society of Breast Imaging. Questions were classified as text-based or containing images from mammography, ultrasound, magnetic resonance imaging, or hybrid imaging. The accuracy rates of GPT-4o and Claude 3.5 Sonnet were compared using the Mann-Whitney U test. Results: Of the total 94 questions, 61.7% were image-based. The overall accuracy rate of GPT-4o was higher than that of Claude 3.5 Sonnet (75.4% vs. 67.7%, p=0.432). GPT-4o achieved higher scores on questions based on ultrasound and hybrid imaging, while Claude 3.5 Sonnet performed better on mammography-based questions. In tumor group cases, both models reached higher accuracy rates compared to the non-tumor group (both, p>0.05). The models' performance in breast imaging cases overall exceeded 75%, ranging between 64-83% for questions involving different imaging modalities. Conclusion: In breast imaging cases, although GPT-4o generally achieved higher accuracy rates than Claude 3.5 Sonnet in image-based and other types of questions, their performances were comparable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative Analysis of Large Language Models' Performance in Breast Imaging

Abstract

Talk to us

Similar Papers

More From: Turkish Journal of Clinics and Laboratory

Lead the way for us

Similar Papers

Breast Cancer Survivors: Does the Screening MRI Debate Continue?
Cherie M Kuzmiak
Academic Radiology | VOL. 22
Cherie M KuzmiakCherie M Kuzmiak
19 Sep 2015
Academic Radiology | VOL. 22

An Overview of the Literature on CEDM
Diego De Benedetto ... Chiara Bellini
-
Diego De Benedetto, et. al.Diego De Benedetto ... Chiara Bellini
01 Jan 2018
01 Jan 2018

Kinematics analysis and trajectory planning for a breast intervention robot under MRI environment
Yongde Zhang ... Haiyan Du
-
Yongde Zhang, et. al.Yongde Zhang ... Haiyan Du
01 Oct 2017
01 Oct 2017

Journal Club

Breast Care | VOL. 5

01 Apr 2010
Breast Care | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Analysis of Large Language Models' Performance in Breast Imaging

Abstract

Talk to us

Similar Papers

More From: Turkish Journal of Clinics and Laboratory