An Effectiveness Study of Generative Artificial Intelligence Tools Used to Develop Multiple-Choice Test Items

Toni A May,Yiyun Kate Fan,Gregory E Stone,Kristin L K Koskey,Connor J Sondergeld,Timothy D Folger,James N Archer,Kathleen Provinzano,Carla C C Johnson

doi:10.3390/educsci15020144

Toni A May, Yiyun Kate Fan + Show 7 more

Open Access

https://doi.org/10.3390/educsci15020144

Copy DOI

Export

Save

Cite

Journal: Education Sciences	Publication Date: Jan 24, 2025
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Generative artificial intelligence (GenAI) tools developed to support teaching and learning are widely available. Trustworthiness concerns, however, have prompted calls for researchers to study their effectiveness and for educators and educational researchers to be involved in their creation and piloting processes. This study investigated one type of GenAI created to support educators: multiple-choice question generators (MCQ GenAI). Among the nine MCQ GenAI tools investigated, a variety of useful options were available, but only one indicated teacher involvement and none mentioned testing experts in development processes. MCQ GenAI-created items (n = 270) were coded based on MCQ quality item-writing guidelines. Results showed 80.00% of items (n = 216) violated at least one guideline, with 73.70% (n = 199) likely to produce major measurement error (should not use without revision), 6.30% (n = 17) likely to elicit minor measurement error (consider modifying), and 20.00% (n = 54) acceptable (usable as created). Implications suggest multidisciplinary teams are needed in educational GenAI tool development.

Full Text