Abstract

IntroductionMultiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. However, most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. We investigated the extent to which item properties vary across multiple clerkship cohorts to better understand the appropriateness of using such guidelines with small cohorts.MethodsExam results for 32 items from an MCQ exam were used. Item discrimination and difficulty coefficients were calculated for 22 cohorts (n = 10–15 students). Discrimination coefficients were categorized according to Ebel and Frisbie (1991). Difficulty coefficients were categorized according to three guidelines by Laveault and Grégoire (2014). Descriptive analyses examined variance in item properties across cohorts.ResultsA large amount of variance in item properties was found across cohorts. Discrimination coefficients for items varied greatly across cohorts, with 29/32 (91%) of items occurring in both Ebel and Frisbie’s ‘poor’ and ‘excellent’ categories and 19/32 (59%) of items occurring in all five categories. For item difficulty coefficients, the application of different guidelines resulted in large variations in examination length (number of items removed ranged from 0 to 22).DiscussionWhile the psychometric properties of items can provide information on item and exam quality, they vary greatly in small cohorts. The application of guidelines with small exam cohorts should be approached with caution.

Highlights

  • Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education

  • As a first step in understanding whether we can use item analysis guidelines to inform our decision processes in small assessment cohorts, this study investigated the amount of variance observed in item properties when an MCQ examination was administered to several sequential clerkship cohorts

  • This study documented large amounts of variance in item difficulty and discrimination coefficients in multiple choice items repeatedly used in small cohorts of learners

Read more

Summary

Introduction

Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. Most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. Multiple-choice based examinations require careful monitoring to ensure continued item and examination quality, and a credible final score on which to base education judgments – including decisions regarding gate-keeping and remediation. One means of monitoring is to rely on item statistics or item properties, such as difficulty and discrimination coefficients These item properties can be derived after exam administration, and are available to help administrators judge the quality of individual items and make decisions regarding the composition of the final examination score [12]. If an item’s properties do not meet predetermined standards, the item may be excluded from the final score to derive a more appropriate final score, and re-evaluated for later use (either maintained or removed from an ‘item bank’)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call