The Relation Between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric

Robert L Johnson,James Penny,Belita Gordon

doi:10.1207/s15324818ame1302_1

Abstract

When the raters of constructed-response items, such as writing samples, disagree on the level of proficiency exhibited in an item, testing agencies must resolve the score discrepancy before computing an operational score for release to the public. Several forms of score resolution are used throughout the assessment industry. In this study, we selected 4 of the more common forms of score resolution that were reported in a national survey of testing agencies and investigated the effect that each form of resolution has on the interrater reliability associated with the resulting operational scores. It is shown that some forms of resolution can be associated with higher reliability than other forms and that some forms may be associated with artificially inflated interrater reliability. Moreover, it is shown that the choice of resolution method may affect the percentage of papers that are defined as passing in a high-stakes assessment.

Full Text