Abstract

Writing assessments are an indispensable part of most language competency tests. In our research, we used cross-classified models to study rater effects in the real essay rating process of a large-scale, high-stakes educational examination administered in China in 2011. Generally, four cross-classified models are suggested for investigation of rater effects: (1) the existence of sequential effects, (2) the direction of the sequential effects, and (3) differences in raters by their individual characteristics. We applied these models to the data to account for possible cluster effects caused by the application of multiple rating strategies. The results of our research showed that raters demonstrated sequential effects during the rating process. In contrast to many other studies on rater effects, our study found that raters exhibited assimilation effects. The more experienced, lenient, and qualified raters were less susceptible to assimilation effects. In addition, our research demonstrated the feasibility and appropriateness of using cross-classified models in assessing rater effects for such data structures. This paper also discusses the implications for educators and practitioners who are interested in reducing sequential effects in the rating process, and suggests directions for future research.

Highlights

  • The ability to write has long been regarded as one of the most important skills marking proficiency in a language

  • Experience had no significant main effect on the response variable [β4 = 0.017, 95% credible interval (CrI): (−0.017, 0.050)], it could moderate the influence of highpro_9, i.e., a unit increase in experience would reduce the effect of highpro_9 by 0.117 [β5 = −0.117, 95% CrI: (−0.209, −0.028)]. These results suggested that even if raters with various levels of experience did not differ in terms of severity or leniency, the raters did differ in their inclination to give scores that were influenced by previous scores they had given

  • The results of this study strongly suggested that cross-classified models have an advantage over other methods for investigating rater effects in real essay rating processes for large-scale, highstakes educational examinations

Read more

Summary

Introduction

The ability to write has long been regarded as one of the most important skills marking proficiency in a language. Writing assessments are an indispensable part of most language tests. Writing assessments request examinees to write essays according to a set of instructions. The essays are scored by human raters based on established rating scales. But over-simplified, view of the rating process, raters first internalize a set of stable and uniform standards, and execute them consistently. They may be lenient or severe in how they enforce standards, raters should treat all responses impartially. Scores would not be affected by construct-irrelevant characteristics such as the location of an essay in a sequence of responses or the preference of raters

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call