Rater variability across examinees and rating criteria in paired speaking assessment

Soo Jung Youn

doi:10.58379/yvwq3768

Abstract

This study investigates rater variability with regard to examinees’ levels and rating criteria in paired speaking assessment. 12 raters completed rater training and scored 102 examinees’ paired speaking performances using analytical rating criteria that reflect various features of paired speaking performance. The raters were fairly consistent in their overall ratings, but differed in their severity. The bias analyses using many-facet Rasch measurement revealed that a higher level of rater bias interaction was found for the rating criteria compared to those of the examinees’ levels and the pairing type which reflects a level difference between two examinees. In particular, the most challenging rating category Language Use attracted significant bias interactions. However, the raters did not display more frequent bias interactions based on the interaction-specific rating categories, such as Engaging with Interaction and Turn Organization. Furthermore, the raters tended to reverse their severity patterns across the rating categories. In the rater and examinee bias interactions, the raters tended to show more frequent bias toward the low-level examinees. However, no significant rater bias was found based on the pairing type that consisted of high-level and low-level examinees. These findings have implications for rater training in paired speaking assessment.

Full Text