Abstract
Abstract Scoring procedures for rater-mediated writing assessments often include checks for agreement between the raters who score students’ essays. When raters assign non-adjacent ratings to the same essay, a third rater is often employed to “resolve” the discrepant ratings. The procedures for flagging essays for score resolution are similar to person fit analyses based on item response theory (IRT). We used data from two writing performance assessments in science and social studies to explore the correspondence between traditional score resolution procedures and IRT person fit statistics. We observed that rater agreement criteria and person fit criteria flag many, but not all, of the same rating profiles for additional investigation. We also observed significantly different values of person fit statistics between students whose essays were and were not flagged for third ratings by the rater agreement criteria. Finally, when we used resolved ratings in place of the original ratings, we observed improvements in person fit for most, but not all, of the students whose essays were flagged for third ratings. These results suggest that person fit analyses may provide a complimentary approach to rater agreement criteria. We discuss these results in terms of their implications for research and practice.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.