Role of visual information in multimodal large language model performance: an evaluation using the Japanese nuclear medicine board examination.

Takashi Watanabe,Akira Baba,Takeshi Fukuda,Ken Watanabe,Jun Woo,Hiroya Ojiri

doi:10.1007/s12149-024-01992-8

Abstract

This study aimed to assess the performance of state-of-the-art multimodal large language models (LLMs), specifically GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, on Japanese Nuclear Medicine Board Examination (JNMBE) questions and to evaluate the influence of visual information on the decision-making process. This study utilized 92 questions with images from the JNMBE (2019-2023). The LLMs' responses were assessed under two conditions: providing both text and images and providing only text. Each model answered all questions thrice, and the most frequent answer choice was considered the final answer. The accuracy and agreement rates among the model answers were evaluated using statistical tests. GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro exhibited no significant differences in terms of accuracy between the text-and-image and text-only conditions. GPT-4o and Claude 3 Opus demonstrated accuracies of 54.3% (95% CI: 44.2%-64.1%) each when provided with both text and images; however, they selected the same options as in the text-only condition for 71.7% of the questions. Gemini 1.5 Pro performed significantly worse than GPT-4o under text and image conditions. The agreement rates among the model answers ranged from weak to moderate. The influence of images on decision-making in nuclear medicine is limited to the latest multimodal LLMs, and their diagnostic ability in this highly specialized field remains insufficient. Improving the utilization of image information and enhancing the answer reproducibility are crucial for the effective application of LLMs in nuclear medicine education and practice. Further advancements in these areas are necessary to harness the potential of LLMs as assistants in nuclear medicine diagnosis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Role of visual information in multimodal large language model performance: an evaluation using the Japanese nuclear medicine board examination.

Abstract

Talk to us

Similar Papers

More From: Annals of nuclear medicine

Lead the way for us

Similar Papers

Disagreements in Medical Ethics Question Answering Between Large Language Models and Physicians.
Shelly Soffer ... Eyal Klang
Research square | VOL. -
Shelly Soffer, et. al.Shelly Soffer ... Eyal Klang
15 Nov 2024
Research square | VOL. -

Current application of ChatGPT in undergraduate nuclear medicine education: Taking Chongqing Medical University as an example
Ailin Deng ... Maohua Rao
Medical Teacher | VOL. ahead-of-print
Ailin Deng, et. al.Ailin Deng ... Maohua Rao
01 Oct 2024
Medical Teacher | VOL. ahead-of-print

Generative AI and large language models in nuclear medicine: current status and future prospects.
Kenji Hirata ... Taiki Nozaki
Annals of nuclear medicine | VOL. 38
Kenji Hirata, et. al.Kenji Hirata ... Taiki Nozaki
25 Sep 2024
Annals of nuclear medicine | VOL. 38

The policies on the use of large language models in radiological journals are lacking: a meta-research study
Jingyu Zhong ... Weiwu Yao
Insights into Imaging | VOL. 15
Jingyu Zhong, et. al.Jingyu Zhong ... Weiwu Yao
01 Aug 2024
Insights into Imaging | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Role of visual information in multimodal large language model performance: an evaluation using the Japanese nuclear medicine board examination.

Abstract

Talk to us

Similar Papers

More From: Annals of nuclear medicine