Evaluating the psychometric properties of ChatGPT-generated questions

Shreya Bhandari,Yunting Liu,Yerin Kwak,Zachary A Pardos

doi:10.1016/j.caeai.2024.100284

Abstract

Not much is known about how LLM-generated questions compare to gold-standard, traditional formative assessments concerning their difficulty and discrimination parameters, which are valued properties in the psychometric measurement field. We follow a rigorous measurement methodology to compare a set of ChatGPT-generated questions, produced from one lesson summary in a textbook, to existing questions from a published Creative Commons textbook. To do this, we collected and analyzed responses from 207 test respondents who answered questions from both item pools and used a linking methodology to compare IRT properties between the two pools. We find that neither the difficulty nor discrimination parameters of the 15 items in each pool differ statistically significantly, with some evidence that the ChatGPT items were marginally better at differentiating different respondent abilities. The response time also does not differ significantly between the two sources of items. The ChatGPT-generated items showed evidence of unidimensionality and did not affect the unidimensionality of the original set of items when tested together. Finally, through a fine-grained learning objective labeling analysis, we found greater similarity in the learning objective distribution of ChatGPT-generated items and the items from the target OpenStax lesson (0.9666) than between ChatGPT-generated items and adjacent OpenStax lessons (0.6859 for the previous lesson and 0.6153 for the subsequent lesson). These results corroborate our conclusion that generative AI can produce algebra items of similar quality to existing textbook questions that share the same construct or constructs as those questions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating the psychometric properties of ChatGPT-generated questions

Abstract

Talk to us

Similar Papers

More From: Computers and Education: Artificial Intelligence

Lead the way for us

Journal: Computers and Education: Artificial Intelligence	Publication Date: Aug 22, 2024
License type: cc-by-nc-nd

Similar Papers

Development and validation of a short version of the Female Sexual Function Index in the Spanish population
Laura Mateu Arrom ... Carlos Errando-Smet
BMC Women's Health | VOL. 21
Laura Mateu Arrom, et. al.Laura Mateu Arrom ... Carlos Errando-Smet
11 Feb 2021
BMC Women's Health | VOL. 21

Parameter Estimation of the Raw Item in Computerized Adaptive Testing
Xiao-Feng You ... Shu-Liang Ding
Acta Psychologica Sinica | VOL. 42
Xiao-Feng You, et. al.Xiao-Feng You ... Shu-Liang Ding
31 Aug 2010
Acta Psychologica Sinica | VOL. 42

Two interpretations of the discrimination parameter
Francis Tuerlinckx ... Paul De Boeck
Psychometrika | VOL. 70
Francis Tuerlinckx, et. al.Francis Tuerlinckx ... Paul De Boeck
01 Dec 2005
Psychometrika | VOL. 70

An Item-Level Examination of a Youth Assessment: Practical Implications of Alternate Measurement and Analytic Procedures
William T Miller ... Christopher J Sullivan
Journal of Forensic Psychology Research and Practice | VOL. ahead-of-print
William T Miller, et. al.William T Miller ... Christopher J Sullivan
04 Mar 2022
Journal of Forensic Psychology Research and Practice | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the psychometric properties of ChatGPT-generated questions

Abstract

Talk to us

Similar Papers

More From: Computers and Education: Artificial Intelligence