Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam

Panagiotis Tsoutsanis,Aristotelis Tsoutsanis

doi:10.1016/j.compbiomed.2023.107794

Abstract

IntroductionAI-powered platforms have gained prominence in medical education and training, offering diverse applications from surgical performance assessment to exam preparation. This research paper examines the capabilities of Large Language Models (LLMs), including Llama 2, Google Bard, Bing Chat, and ChatGPT-3.5, in answering multiple-choice questions of the Clinical Problem Solving (CPS) paper of the Multi-Specialty Recruitment Assessment (MSRA) exam. MethodsUsing a dataset of 100 CPS questions from ten subject categories, we assessed the LLMs' performance against medical doctors preparing for the exam. ResultsResults showed that Bing Chat outperformed all other LLMs and even surpassed human users from the Qbank question bank. Conversely, Llama 2's performance was inferior to human users. Google Bard and ChatGPT 3.5 did not exhibit statistically significant differences in correct response rates compared to human candidates.Pairwise comparisons demonstrated Bing Chat's significant superiority over Llama 2, Google Bard, and ChatGPT 3.5. However, no significant differences were found between Llama 2 and Google Bard, Llama 2, and ChatGPT-3.5, and Google Bard and ChatGPT-3.5. DiscussionFreely available LLMs have already demonstrated that they can perform as well or even outperform human users in answering MSRA exam questions. Bing Chat emerged as a particularly strong performer. The study also highlights the potential for enhancing LLMs' medical knowledge acquisition through tailored fine-tuning. Medical knowledge tailored LLMs such as Med-PaLM, have already shown promising results. ConclusionWe provided valuable insights into LLMs' competence in answering medical MCQs and their potential integration into medical education and assessment processes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Journal: Computers in Biology and Medicine	Publication Date: Nov 30, 2023
Citations: 9

Similar Papers

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

The Application of Large Language Models in Gastroenterology: A Review of the Literature.
Marcello Maida ... Daryl Ramai
Cancers | VOL. 16
Marcello Maida, et. al.Marcello Maida ... Daryl Ramai
28 Sep 2024
Cancers | VOL. 16

Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models.
Honghao Lai ... Janne Estill
JAMA Network Open | VOL. 7
Honghao Lai, et. al.Honghao Lai ... Janne Estill
22 May 2024
JAMA Network Open | VOL. 7

A systematic review of large language models and their implications in medical education.
Harrison C Lucas ... Jamie R Robinson
Medical education | VOL. 58
Harrison C Lucas, et. al.Harrison C Lucas ... Jamie R Robinson
19 Apr 2024
Medical education | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine