Abstract

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale types impact rating performance and scores. The raters in this study rated twenty short EFL essays using the two scales, completed a rater cognition questionnaire, and took part in an in-depth interview. The ratings were analyzed using a multi-faceted Rasch analysis to compare essay scores and rater statistics across scales and rater groups. The results indicated when using the binary scale, the raters spent less time and were less spread out and more consistent in their ratings. Three out of four raters replied that less mental effort was required when using the binary scale and felt more confident in their ratings. Across the two rater groups, there was a bigger shift in rating performance when using the binary scale for the teacher raters than the expert raters. This implies that scale design had a greater effect on teacher raters. The overall findings suggest that the binary scale maybe a better fit for large scale assessment with sufficient rater training.

Highlights

  • With the increase in performance-based language assessment, different scales are being developed for different assessment purposes

  • This study examines the effects of two different scales with two different rater groups and their interactions on estimates of student writing scores, rater agreement, rater severity, and self-consistency

  • Young (Z = 0.00, p = 1.00), Sue, (Z = 0.69, p = 0.490), and Sean (Z = − 1.382, p = 0.167) showed no significant differences, which implies that the rating was similar across the two scales, but for Fred (z = − 2.620, p = 0.009), the Wilcoxon signed-rank test did show a significant difference between the scores from the binary scale compared to the analytic scale

Read more

Summary

Introduction

With the increase in performance-based language assessment, different scales are being developed for different assessment purposes. While there has been research on the effects of tasks and raters on ratings (Schoonen, 2005), there has been limited research (Bacha, 2001; Barkaoui, 2007, 2010, 2011; O’Loughlin, 1994; Song & Caruso, 1996) on how different scales impact rating performance and scores. For this reason, I compare two different scale types (binary and analytic) that were developed on the same assessment construct (i.e., paragraph structure, content, form, and vocabulary). I use a binary scale that is similar to the empirically derived, binarychoice, boundary-defined (EBB) scale originally developed by Turner and Upshur (2019) 9:20

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call