Abstract

ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call