Abstract

Abstract Scoring procedures for rater-mediated writing assessments often include checks for agreement between the raters who score students’ essays. When raters assign non-adjacent ratings to the same essay, a third rater is often employed to “resolve” the discrepant ratings. The procedures for flagging essays for score resolution are similar to person fit analyses based on item response theory (IRT). We used data from two writing performance assessments in science and social studies to explore the correspondence between traditional score resolution procedures and IRT person fit statistics. We observed that rater agreement criteria and person fit criteria flag many, but not all, of the same rating profiles for additional investigation. We also observed significantly different values of person fit statistics between students whose essays were and were not flagged for third ratings by the rater agreement criteria. Finally, when we used resolved ratings in place of the original ratings, we observed improvements in person fit for most, but not all, of the students whose essays were flagged for third ratings. These results suggest that person fit analyses may provide a complimentary approach to rater agreement criteria. We discuss these results in terms of their implications for research and practice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call