A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment

Hyun Jung Kim

doi:10.1080/15434303.2015.1049353

Abstract

Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters’ cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three rater groups (novice, developing, and expert) were first operationalized on the basis of four background variables (rating experience, teaching experience, rater training, and educational background) to predict different levels of expertise in rating. The three groups of raters then evaluated 18 ESL learners’ oral responses using an analytic scoring rubric across three occasions, separated by one-month intervals. Recorded verbal report data were analyzed (a) to compare rater behavior across the three groups and (b) to examine the development of rating performance within each group over time. The analysis revealed that the three groups of raters from different backgrounds presented varying levels of rating ability and different paces of improvement in their rating performance. The findings of the study suggest that a comprehensive consideration of rater characteristics contributes to a better understanding of raters’ different needs for training and rating.

Full Text