Abstract

The intra-rater reliability in rating essays is usually indexed by the inter-rater correlation. We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations. The validity of the method is demonstrated by extensive simulations, and by applying it to an empirical dataset. It is recommended to use this estimation method whenever the emphasis is not on the average intra-reliability of a group of raters, but when the intra-rater reliability of a specific rater is of interest, e.g. when the error-variance component of the scores is of interest in order to estimate true scores.

Highlights

  • The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks

  • We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations

  • It applies in every situation that calls for human rating, be it in the context of K-12 writing, or in the context of openended questions, for which there are agreed-upon scoring rubrics

Read more

Summary

Introduction

The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks. The great diversity among raters, even after they have undergone a long training period; a diversity that is reflected in the final numerical ratings. Raters differ in their leniency/strictness, in their tendency to use (or not) the full range of the rating scale, and in the consistency in which they rate the essays (e.g., as captured by the Hierarchical Rater Model, Patz et al, 2002). Intra-rater reliability is estimated by having the rater read and evaluate each paper more than once This is seldom implemented, both because of its cost and because the two readings of the same essay by the same rater cannot be considered as genuinely independent. The purpose of this paper is to suggest a simple way to estimate intra-rater reliability and to test its adequacy using both simulated and real data

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call