ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.

Brenton T Bicknell,Danner Butler,Sydney Whalen,James Ricks,Cory J Dixon,Abigail B Clark,Olivia Spaedy,Adam Skelton,Neel Edupuganti,Lance Dzubinski,Hudson Tate,Garrett Dyess,Brenessa Lindeman,Lisa Soleymani Lehmann

doi:10.2196/63430

Abstract

Recent studies, including those by the National Board of Medical Examiners, have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of LLM performance in specific medical content areas, thus limiting an assessment of their potential utility in medical education. This study aimed to assess and compare the accuracy of successive ChatGPT versions (GPT-3.5, GPT-4, and GPT-4 Omni) in USMLE disciplines, clinical clerkships, and the clinical skills of diagnostics and management. This study used 750 clinical vignette-based multiple-choice questions to characterize the performance of successive ChatGPT versions (ChatGPT 3.5 [GPT-3.5], ChatGPT 4 [GPT-4], and ChatGPT 4 Omni [GPT-4o]) across USMLE disciplines, clinical clerkships, and in clinical skills (diagnostics and management). Accuracy was assessed using a standardized protocol, with statistical analyses conducted to compare the models' performances. GPT-4o achieved the highest accuracy across 750 multiple-choice questions at 90.4%, outperforming GPT-4 and GPT-3.5, which scored 81.1% and 60.0%, respectively. GPT-4o's highest performances were in social sciences (95.5%), behavioral and neuroscience (94.2%), and pharmacology (93.2%). In clinical skills, GPT-4o's diagnostic accuracy was 92.7% and management accuracy was 88.8%, significantly higher than its predecessors. Notably, both GPT-4o and GPT-4 significantly outperformed the medical student average accuracy of 59.3% (95% CI 58.3-60.3). GPT-4o's performance in USMLE disciplines, clinical clerkships, and clinical skills indicates substantial improvements over its predecessors, suggesting significant potential for the use of this technology as an educational aid for medical students. These findings underscore the need for careful consideration when integrating LLMs into medical education, emphasizing the importance of structured curricula to guide their appropriate use and the need for ongoing critical analyses to ensure their reliability and effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical education	Publication Date: Nov 6, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.

Abstract

Talk to us

Similar Papers

More From: JMIR medical education

Lead the way for us

Similar Papers

Skin of color representation in medical education: An analysis of National Board of Medical Examiners' self-assessments and popular question banks
Abigail L Meckley ... Robert P Dellavalle
Journal of the American Academy of Dermatology | VOL. 86
Abigail L Meckley, et. al.Abigail L Meckley ... Robert P Dellavalle
04 Oct 2021
Journal of the American Academy of Dermatology | VOL. 86

The Ohio State University College of Medicine
Daniel M Clinchot ... Catherine R Lucey
Academic Medicine | VOL. 85
Daniel M Clinchot, et. al.Daniel M Clinchot ... Catherine R Lucey
01 Sep 2010
Academic Medicine | VOL. 85

COMLEX-USA and USMLE for Osteopathic Medical Students: Should We Duplicate, Divide, or Unify?
Harris Ahmed ... J Bryan Carmody
Journal of Graduate Medical Education | VOL. 14
Harris Ahmed, et. al.Harris Ahmed ... J Bryan Carmody
01 Feb 2022
Journal of Graduate Medical Education | VOL. 14

Medical Student Education
Linda Mottow Lippa
Ophthalmology | VOL. 113
Linda Mottow LippaLinda Mottow Lippa
28 Apr 2006
Ophthalmology | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.

Abstract

Talk to us

Similar Papers

More From: JMIR medical education