Abstract

The purpose of this study was to compare the quality of multiple choice questions (MCQs) developed using automated item generation (AIG) versus traditional methods, as judged by a panel of experts. The quality of MCQs developed using two methods (i.e., AIG or traditional) was evaluated by a panel of content experts in a blinded study. Participants rated a total of 102 MCQs using six quality metrics and made a judgment regarding whether or not each item tested recall or application of knowledge. A Wilcoxon two-sample test evaluated differences in each of the six quality metrics rating scales as well as an overall cognitive domain judgment. No significant differences were found in terms of item quality or cognitive domain assessed when comparing the two item development methods. The vast majority of items (> 90%) developed using both methods were deemed to be assessing higher-order skills. When compared to traditionally developed items, MCQs developed using AIG demonstrated comparable quality. Both modalities can produce items that assess higher-order cognitive skills.

Highlights

  • In recent years, automated item generation (AIG) has been increasingly used to create multiple-choice questions (MCQs) for the assessment of health professionals (Gierl, Lai, & Turner, 2012; Lai, Gierl, Byrne, Spielman, & Waldschmidt, 2016)

  • AIG relies on the use of computer algorithms to generate a large number of MCQs by inputting and coding information derived from a cognitive model (Gierl et al, 2012)

  • Frequency: quality metrics rating scales Table 3 provides the frequency distributions for the six quality metrics by item modality (AIG and traditionally developed)

Read more

Summary

Introduction

In recent years, automated item generation (AIG) has been increasingly used to create multiple-choice questions (MCQs) for the assessment of health professionals (Gierl, Lai, & Turner, 2012; Lai, Gierl, Byrne, Spielman, & Waldschmidt, 2016) This move is in part due to changes to the assessment landscape which have led educators to seek ways to provide more frequent testing opportunities. If a clinician is asked to articulate their approach to a patient presenting with hyponatremia, they will identify the factors that will allow them to diagnose and manage the patient These factors may include historical features (e.g., recent fluid intake/losses or medication use), physical examination findings (e.g., volume status), and laboratory results (e.g., urinary sodium). The resulting model accounts for these differences and can be translated into code to generate MCQs through linear optimization (Gierl & Lai, 2013)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call