Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

Jennifer Gotta,Quang Anh Le Hong,Vitali Koch,Leon D Gruenewald,Tobias Geyer,Simon S Martin,Jan-Erik Scholtz,Christian Booz,Daniel Pinto Dos Santos,Scherwin Mahmoudi,Katrin Eichler,Tatjana Gruber-Rouh,Renate Hammerstingl,Teodora Biciusca,Lisa Joy Juergens,Elena Höhne,Christoph Mader,Thomas J Vogl,Philipp Reschke

doi:10.1055/a-2437-2067

Abstract

The evolving field of medical education is being shaped by technological advancements, including the integration of Large Language Models (LLMs) like ChatGPT. These models could be invaluable resources for medical students, by simplifying complex concepts and enhancing interactive learning by providing personalized support. LLMs have shown impressive performance in professional examinations, even without specific domain training, making them particularly relevant in the medical field. This study aims to assess the performance of LLMs in radiology examinations for medical students, thereby shedding light on their current capabilities and implications.This study was conducted using 151 multiple-choice questions, which were used for radiology exams for medical students. The questions were categorized by type and topic and were then processed using OpenAI's GPT-3.5 and GPT- 4 via their API, or manually put into Perplexity AI with GPT-3.5 and Bing. LLM performance was evaluated overall, by question type and by topic.GPT-3.5 achieved a 67.6% overall accuracy on all 151 questions, while GPT-4 outperformed it significantly with an 88.1% overall accuracy (p<0.001). GPT-4 demonstrated superior performance in both lower-order and higher-order questions compared to GPT-3.5, Perplexity AI, and medical students, with GPT-4 particularly excelling in higher-order questions. All GPT models would have successfully passed the radiology exam for medical students at our university.In conclusion, our study highlights the potential of LLMs as accessible knowledge resources for medical students. GPT-4 performed well on lower-order as well as higher-order questions, making ChatGPT-4 a potentially very useful tool for reviewing radiology exam questions. Radiologists should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses. · ChatGPT demonstrated remarkable performance, achieving a passing grade on a radiology examination for medical students that did not include image questions.. · GPT-4 exhibits significantly improved performance compared to its predecessors GPT-3.5 and Perplexity AI with 88% of questions answered correctly.. · Radiologists as well as medical students should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses.. · Gotta J, Le Hong QA, Koch V et al. Large language models (LLMs) in radiology exams for medical students: Performance and consequences. Fortschr Röntgenstr 2024; DOI 10.1055/a-2437-2067.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

Abstract

Talk to us

Similar Papers

More From: RoFo : Fortschritte auf dem Gebiete der Rontgenstrahlen und der Nuklearmedizin

Lead the way for us

Similar Papers

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... W Nick Street
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... W Nick Street
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

A cross sectional investigation of ChatGPT-like large language models application among medical students in China
Guixia Pan ... Jing Ni
BMC Medical Education | VOL. 24
Guixia Pan, et. al.Guixia Pan ... Jing Ni
23 Aug 2024
BMC Medical Education | VOL. 24

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.
Michael S Deiner ... Urmimala Sarkar
JMIR infodemiology | VOL. 4
Michael S Deiner, et. al.Michael S Deiner ... Urmimala Sarkar
29 Aug 2024
JMIR infodemiology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

Abstract

Talk to us

Similar Papers

More From: RoFo : Fortschritte auf dem Gebiete der Rontgenstrahlen und der Nuklearmedizin