Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.

Geoffrey W Rutledge

doi:10.1002/lrh2.10438

Abstract

Large language models (LLMs) have a high diagnostic accuracy when they evaluate previously published clinical cases. We compared the accuracy of GPT-4's differential diagnoses for previously unpublished challenging case scenarios with the diagnostic accuracy for previously published cases. For a set of previously unpublished challenging clinical cases, GPT-4 achieved 61.1% correct in its top 6 diagnoses versus the previously reported 49.1% for physicians. For a set of 45 clinical vignettes of more common clinical scenarios, GPT-4 included the correct diagnosis in its top 3 diagnoses 100% of the time versus the previously reported 84.3% for physicians. GPT-4 performs at a level at least as good as, if not better than, that of experienced physicians on highly challenging cases in internal medicine. The extraordinary performance of GPT-4 on diagnosing common clinical scenarios could be explained in part by the fact that these cases were previously published and may have been included in the training dataset for this LLM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.

Abstract

Talk to us

Similar Papers

More From: Learning health systems

Lead the way for us

Journal: Learning health systems	Publication Date: Jun 25, 2024
License type: CC BY-NC 4.0

Similar Papers

Think Twice: Effects on Diagnostic Accuracy of Returning to the Case to Reflect Upon the Initial Diagnosis.
Sílvia Mamede ... Henk G Schmidt
Academic Medicine | VOL. 95
Sílvia Mamede, et. al.Sílvia Mamede ... Henk G Schmidt
14 Jan 2020
Academic Medicine | VOL. 95

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.
Ethan Goh ... Jonathan H Chen
JAMA network open | VOL. 7
Ethan Goh, et. al.Ethan Goh ... Jonathan H Chen
28 Oct 2024
JAMA network open | VOL. 7

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

E-096 The Role of In-Vitro Modeling in addressing Challenging Occlusions
M Gilvarry ... D Vale
Journal of NeuroInterventional Surgery | VOL. 8
M Gilvarry, et. al.M Gilvarry ... D Vale
01 Jul 2016
Journal of NeuroInterventional Surgery | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.

Abstract

Talk to us

Similar Papers

More From: Learning health systems