How sure can we be that a student really failed? On the measurement precision of individual pass-fail decisions from the perspective of Item Response Theory

Stefan K Schauber,Martin Hecht

doi:10.1080/0142159x.2020.1811844

Abstract

Background In high-stakes assessments in medical education, the decision to let a particular participant pass or fail has far-reaching consequences. Reliability coefficients are usually used to support the trustworthiness of assessments and their accompanying decisions. However, coefficients such as Cronbach’s Alpha do not indicate the precision with which an individual’s performance was measured. Objective Since estimates of precision need to be aligned with the level on which inferences are made, we illustrate how to adequately report the precision of pass-fail decisions for single individuals. Method We show how to calculate the precision of individual pass-fail decisions using Item Response Theory and illustrate that approach using a real exam. In total, 70 students sat this exam (110 items). Reliability coefficients were above recommendations for high stakes test (> 0.80). At the same time, pass-fail decisions around the cut score were expected to show low accuracy. Conclusions Our results illustrate that the most important decisions–i.e. those based on scores near the pass-fail cut-score–are often ambiguous, and that reporting a traditional reliability coefficient is not an adequate description of the uncertainty encountered on an individual level.

Full Text