Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.

Leo Morjaria,Keyna Bracken,Matthew Sibbald,Penelope Thompson,Quang N Ngo,Mark Lee,Anthony J Levinson,Levi Burns,John Smith

doi:10.1177/23821205231204178

Abstract

ChatGPT is an artificial intelligence model that can interpret free-text prompts and return detailed, human-like responses across a wide domain of subjects. This study evaluated the extent of the threat posed by ChatGPT to the validity of short-answer assessment problems used to examine pre-clerkship medical students in our undergraduate medical education program. Forty problems used in prior student assessments were retrieved and stratified by levels of Bloom's Taxonomy. Thirty of these problems were submitted to ChatGPT-3.5. For the remaining 10 problems, we retrieved past minimally passing student responses. Six tutors graded each of the 40 responses. Comparison of performance between student-generated and ChatGPT-generated answers aggregated as a whole and grouped by Bloom's levels of cognitive reasoning, was done using t-tests, ANOVA, Cronbach's alpha, and Cohen's d. Scores for ChatGPT-generated responses were also compared to historical class average performance. ChatGPT-generated responses received a mean score of 3.29 out of 5 (n = 30, 95% CI 2.93-3.65) compared to 2.38 for a group of students meeting minimum passing marks (n = 10, 95% CI 1.94-2.82), representing higher performance (P = .008, η2 = 0.169), but was outperformed by historical class average scores on the same 30 problems (mean 3.67, P = .018) when including all past responses regardless of student performance level. There was no statistically significant trend in performance across domains of Bloom's Taxonomy. While ChatGPT was able to pass short answer assessment problems spanning the pre-clerkship curriculum, it outperformed only underperforming students. We remark that tutors in several cases were convinced that ChatGPT-produced responses were produced by students. Risks to assessment validity include uncertainty in identifying struggling students and inability to intervene in a timely manner. The performance of ChatGPT on problems requiring increasing demands of cognitive reasoning warrants further research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Education and Curricular Development	Publication Date: Jan 1, 2023
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.

Abstract

Talk to us

Similar Papers

More From: Journal of Medical Education and Curricular Development

Lead the way for us

Similar Papers

College of Medicine, University of Saskatchewan.
Greg Malin ... Patricia Blakley
Academic medicine : journal of the Association of American Medical Colleges | VOL. 95
Greg Malin, et. al.Greg Malin ... Patricia Blakley
21 Aug 2020
Academic medicine : journal of the Association of American Medical Colleges | VOL. 95

Differences in curriculum emphasis in US undergraduate and generalist residency education programmes.
Emilie H Osborn ... Edward O'Neil
Medical education | VOL. 33
Emilie H Osborn, et. al.Emilie H Osborn ... Edward O'Neil
01 Dec 1999
Medical education | VOL. 33

Undergraduate medical education programme renewal: a longitudinal context, input, process and product evaluation study.
Azim Mirzazadeh ... Ali Jafarian
Perspectives on medical education | VOL. 5
Azim Mirzazadeh, et. al.Azim Mirzazadeh ... Ali Jafarian
28 Jan 2016
Perspectives on medical education | VOL. 5

Northern Ontario School of Medicine
Marie C Matte ... Roger Strasser
Academic Medicine | VOL. 85
Marie C Matte, et. al.Marie C Matte ... Roger Strasser
01 Sep 2010
Academic Medicine | VOL. 85

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.

Abstract

Talk to us

Similar Papers

More From: Journal of Medical Education and Curricular Development