Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program

Leo Morjaria,Matthew Sibbald,Keyna Bracken,Mark Lee,Quang N Ngo,Levi Burns,Anthony J Levinson

doi:10.3390/ime3010004

Abstract

Traditional approaches to marking short-answer questions face limitations in timeliness, scalability, inter-rater reliability, and faculty time costs. Harnessing generative artificial intelligence (AI) to address some of these shortcomings is attractive. This study aims to validate the use of ChatGPT for evaluating short-answer assessments in an undergraduate medical program. Ten questions from the pre-clerkship medical curriculum were randomly chosen, and for each, six previously marked student answers were collected. These sixty answers were evaluated by ChatGPT in July 2023 under four conditions: with both a rubric and standard, with only a standard, with only a rubric, and with neither. ChatGPT displayed good Spearman correlations with a single human assessor (r = 0.6–0.7, p < 0.001) across all conditions, with the absence of a standard or rubric yielding the best correlation. Scoring differences were common (65–80%), but score adjustments of more than one point were less frequent (20–38%). Notably, the absence of a rubric resulted in systematically higher scores (p < 0.001, partial η2 = 0.33). Our findings demonstrate that ChatGPT is a viable, though imperfect, assistant to human assessment, performing comparably to a single expert assessor. This study serves as a foundation for future research on AI-based assessment techniques with potential for further optimization and increased reliability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Medical Education	Publication Date: Jan 19, 2024
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program

Abstract

Talk to us

Similar Papers

More From: International Medical Education

Lead the way for us

Similar Papers

College of Medicine, University of Saskatchewan.
Greg Malin ... Patricia Blakley
Academic medicine : journal of the Association of American Medical Colleges | VOL. 95
Greg Malin, et. al.Greg Malin ... Patricia Blakley
21 Aug 2020
Academic medicine : journal of the Association of American Medical Colleges | VOL. 95

Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.
Leo Morjaria ... John Smith
Journal of Medical Education and Curricular Development | VOL. 10
Leo Morjaria, et. al.Leo Morjaria ... John Smith
01 Jan 2023
Journal of Medical Education and Curricular Development | VOL. 10

Teaching conceptions and approaches to teaching of medical school faculty: The difference between how medical school teachersthinkabout teaching and how they say that theydoteach
G Peeraer ... A.J.J.A Scherpbier
Medical Teacher | VOL. 33
G Peeraer, et. al.G Peeraer ... A.J.J.A Scherpbier
22 Jun 2011
Medical Teacher | VOL. 33

Exploring frontline faculty perspectives after a curriculum change.
Shannon L Venance ... Christopher J Watling
Medical education | VOL. 48
Shannon L Venance, et. al.Shannon L Venance ... Christopher J Watling
09 Sep 2014
Medical education | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program

Abstract

Talk to us

Similar Papers

More From: International Medical Education