Abstract

When the raters of constructed-response items, such as writing samples, disagree on the level of proficiency exhibited in an item, testing agencies must resolve the score discrepancy before computing an operational score for release to the public. Several forms of score resolution are used throughout the assessment industry. In this study, we selected 4 of the more common forms of score resolution that were reported in a national survey of testing agencies and investigated the effect that each form of resolution has on the interrater reliability associated with the resulting operational scores. It is shown that some forms of resolution can be associated with higher reliability than other forms and that some forms may be associated with artificially inflated interrater reliability. Moreover, it is shown that the choice of resolution method may affect the percentage of papers that are defined as passing in a high-stakes assessment.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.