LLM-based automatic short answer grading in undergraduate medical education

Christian Grévisse

doi:10.1186/s12909-024-06026-5

Abstract

BackgroundMultiple choice questions are heavily used in medical education assessments, but rely on recognition instead of knowledge recall. However, grading open questions is a time-intensive task for teachers. Automatic short answer grading (ASAG) has tried to fill this gap, and with the recent advent of Large Language Models (LLM), this branch has seen a new momentum.MethodsWe graded 2288 student answers from 12 undergraduate medical education courses in 3 languages using GPT-4 and Gemini 1.0 Pro.ResultsGPT-4 proposed significantly lower grades than the human evaluator, but reached low rates of false positives. The grades of Gemini 1.0 Pro were not significantly different from the teachers’. Both LLMs reached a moderate agreement with human grades, and a high precision for GPT-4 among answers considered fully correct. A consistent grading behavior could be determined for high-quality keys. A weak correlation was found wrt. the length or language of student answers. There is a risk of bias if the LLM knows the human grade a priori.ConclusionsLLM-based ASAG applied to medical education still requires human oversight, but time can be spared on the edge cases, allowing teachers to focus on the middle ones. For Bachelor-level medical education questions, the training knowledge of LLMs seems to be sufficient, fine-tuning is thus not necessary.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LLM-based automatic short answer grading in undergraduate medical education

Abstract

Talk to us

Similar Papers

More From: BMC Medical Education

Lead the way for us

Journal: BMC Medical Education	Publication Date: Sep 27, 2024
License type: cc-by-nc-nd

Similar Papers

Automatic Short Answer Grading for Finnish with ChatGPT
Li-Hsin Chang ... Filip Ginter
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Li-Hsin Chang, et. al.Li-Hsin Chang ... Filip Ginter
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Meta-analysis of PBL teaching effect of basic medical courses in undergraduate medical edu-cation
...
Chinese Journal of Medical Education Research | VOL. 13
, et. al. ...
20 Jun 2014
Chinese Journal of Medical Education Research | VOL. 13

A systematic review of factors influencing student ratings in undergraduate medical education course evaluations.
Sarah Schiekirka ... Tobias Raupach
BMC Medical Education | VOL. 15
Sarah Schiekirka, et. al.Sarah Schiekirka ... Tobias Raupach
05 Mar 2015
BMC Medical Education | VOL. 15

Effect of team-based learning on basic medical courses in undergraduate medical education: a Meta-analysis
...
Chinese Journal of Medical Education Research | VOL. 14
, et. al. ...
20 Aug 2015
Chinese Journal of Medical Education Research | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LLM-based automatic short answer grading in undergraduate medical education

Abstract

Talk to us

Similar Papers

More From: BMC Medical Education