Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases.

Tomohiro Kikuchi,Takahiro Nakao,Yuta Nakamura,Shouhei Hanaoka,Harushi Mori,Takeharu Yoshikawa

doi:10.3174/ajnr.a8332

Abstract

The rise of large language models such as generative pretrained transformers (GPTs) has sparked considerable interest in radiology, especially in interpreting radiologic reports and image findings. While existing research has focused on GPTs estimating diagnoses from radiologic descriptions, exploring alternative diagnostic information sources is also crucial. This study introduces the use of GPTs (GPT-3.5 Turbo and GPT-4) for information retrieval and summarization, searching relevant case reports via PubMed, and investigates their potential to aid diagnosis. From October 2021 to December 2023, we selected 115 cases from the "Case of the Week" series on the American Journal of Neuroradiology website. Their Description and Legend sections were presented to the GPTs for the 2 tasks. For the Direct Diagnosis task, the models provided 3 differential diagnoses that were considered correct if they matched the diagnosis in the diagnosis section. For the Case Report Search task, the models generated 2 keywords per case, creating PubMed search queries to extract up to 3 relevant reports. A response was considered correct if reports containing the disease name stated in the diagnosis section were extracted. The McNemar test was used to evaluate whether adding a Case Report Search to Direct Diagnosis improved overall accuracy. In the Direct Diagnosis task, GPT-3.5 Turbo achieved a correct response rate of 26% (30/115 cases), whereas GPT-4 achieved 41% (47/115). For the Case Report Search task, GPT-3.5 Turbo scored 10% (11/115), and GPT-4 scored 7% (8/115). Correct responses totaled 32% (37/115) with 3 overlapping cases for GPT-3.5 Turbo, whereas GPT-4 had 43% (50/115) of correct responses with 5 overlapping cases. Adding Case Report Search improved GPT-3.5 Turbo's performance (P = .023) but not that of GPT-4 (P = .248). The effectiveness of adding Case Report Search to GPT-3.5 Turbo was particularly pronounced, suggesting its potential as an alternative diagnostic approach to GPTs, particularly in scenarios where direct diagnoses from GPTs are not obtainable. Nevertheless, the overall performance of GPT models in both direct diagnosis and case report retrieval tasks remains not optimal, and users should be aware of their limitations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases.

Abstract

Talk to us

Similar Papers

More From: AJNR. American journal of neuroradiology

Lead the way for us

Journal: AJNR. American journal of neuroradiology	Publication Date: May 7, 2024
Citations: 1

Similar Papers

CytoJournal Quiz Cases: Publishing open-access, PubMed-searchable cytopathology cases with educational messages.
Katherine G Akers ... Vinod B Shidman
CytoJournal | VOL. 13
Katherine G Akers, et. al.Katherine G Akers ... Vinod B Shidman
27 Jul 2016
CytoJournal | VOL. 13

An Unusual Pulmonary Mass With Mediastinal Invasion and Multiple Intrapulmonary Nodules in a 52-Year-Old Man
Chih-Jen Yang ... Yi-Chang Liu
Chest | VOL. 141
Chih-Jen Yang, et. al.Chih-Jen Yang ... Yi-Chang Liu
01 Jan 2012
Chest | VOL. 141

Improving Competence in Diagnosis, Referral and Initial Management of Acquired Hemophilia a through Online Interactive Case-Based Education
Christy Rohani-Montez ... Alice Ma
Blood | VOL. 138
Christy Rohani-Montez, et. al.Christy Rohani-Montez ... Alice Ma
05 Nov 2021
Blood | VOL. 138

Interactive Case-Based Education on Pyruvate Kinase Deficiency Improves Hematologists' Competence in Implementing the Latest Diagnostic and Management Approaches
S Christy Rohani-Montez ... Hanny Al-Samkari
Blood | VOL. 142
S Christy Rohani-Montez, et. al.S Christy Rohani-Montez ... Hanny Al-Samkari
28 Nov 2023
Blood | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases.

Abstract

Talk to us

Similar Papers

More From: AJNR. American journal of neuroradiology