ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.

Daisuke Horiuchi,Hiroyuki Tatekawa,Tatsushi Oura,Taro Shimono,Shannon L Walston,Hirotaka Takita,Shu Matsushita,Yasuhito Mitsuyama,Yukio Miki,Daiju Ueda,Daiju Ueda

doi:10.1007/s00330-024-10902-5

Abstract

To compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology. We included 106 "Test Yourself" cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists. GPT-4-based ChatGPT significantly outperformed GPT-4V-based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4-based ChatGPT was comparable to that of the radiology resident, but was lower than that of the board-certified radiologist although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V-based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively). GPT-4-based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V-based ChatGPT. While GPT-4-based ChatGPT's diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology. GPT-4-based ChatGPT outperformed GPT-4V-based ChatGPT and was comparable to radiology residents, but it did not reach the level of board-certified radiologists in musculoskeletal radiology. Radiologists should comprehend ChatGPT's current performance as a diagnostic tool for optimal utilization. This study compared the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in musculoskeletal radiology. GPT-4-based ChatGPT was comparable to radiology residents, but did not reach the level of board-certified radiologists. When utilizing ChatGPT, it is crucial to input appropriate descriptions of imaging findings rather than the images.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European radiology	Publication Date: Jul 12, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.

Abstract

Talk to us

Similar Papers

More From: European radiology

Lead the way for us

Similar Papers

Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.
Daisuke Horiuchi ... Yukio Miki
Clinical Neuroradiology | VOL. -
Daisuke Horiuchi, et. al.Daisuke Horiuchi ... Yukio Miki
28 May 2024
Clinical Neuroradiology | VOL. -

Artificial Intelligence in Chest Radiography Reporting Accuracy: Added Clinical Value in the Emergency Unit Setting Without 24/7 Radiology Coverage.
Jan Rudolph ...
Investigative Radiology | VOL. 57
Jan Rudolph, et. al.Jan Rudolph ...
06 Aug 2021
Investigative Radiology | VOL. 57

Accuracy of ChatGPT generated diagnosis from patient's medical history and imaging findings in neuroradiology cases.
Daisuke Horiuchi ... Daiju Ueda
Neuroradiology | VOL. 66
Daisuke Horiuchi, et. al.Daisuke Horiuchi ... Daiju Ueda
23 Nov 2023
Neuroradiology | VOL. 66

Revolution or risk?-Assessing the potential and challenges of GPT-4V in radiologic image interpretation.
Marc Sebastian Huppertz ... Sven Nebelung
European radiology | VOL. -
Marc Sebastian Huppertz, et. al.Marc Sebastian Huppertz ... Sven Nebelung
18 Oct 2024
European radiology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.

Abstract

Talk to us

Similar Papers

More From: European radiology