Abstract
The intra-rater reliability in rating essays is usually indexed by the inter-rater correlation. We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations. The validity of the method is demonstrated by extensive simulations, and by applying it to an empirical dataset. It is recommended to use this estimation method whenever the emphasis is not on the average intra-reliability of a group of raters, but when the intra-rater reliability of a specific rater is of interest, e.g. when the error-variance component of the scores is of interest in order to estimate true scores.
Highlights
The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks
We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations
It applies in every situation that calls for human rating, be it in the context of K-12 writing, or in the context of openended questions, for which there are agreed-upon scoring rubrics
Summary
The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks. The great diversity among raters, even after they have undergone a long training period; a diversity that is reflected in the final numerical ratings. Raters differ in their leniency/strictness, in their tendency to use (or not) the full range of the rating scale, and in the consistency in which they rate the essays (e.g., as captured by the Hierarchical Rater Model, Patz et al, 2002). Intra-rater reliability is estimated by having the rater read and evaluate each paper more than once This is seldom implemented, both because of its cost and because the two readings of the same essay by the same rater cannot be considered as genuinely independent. The purpose of this paper is to suggest a simple way to estimate intra-rater reliability and to test its adequacy using both simulated and real data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have