Abstract

Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters’ cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three rater groups (novice, developing, and expert) were first operationalized on the basis of four background variables (rating experience, teaching experience, rater training, and educational background) to predict different levels of expertise in rating. The three groups of raters then evaluated 18 ESL learners’ oral responses using an analytic scoring rubric across three occasions, separated by one-month intervals. Recorded verbal report data were analyzed (a) to compare rater behavior across the three groups and (b) to examine the development of rating performance within each group over time. The analysis revealed that the three groups of raters from different backgrounds presented varying levels of rating ability and different paces of improvement in their rating performance. The findings of the study suggest that a comprehensive consideration of rater characteristics contributes to a better understanding of raters’ different needs for training and rating.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call