Estimating the Intra-Rater Reliability of Essay Raters

Yoav Cohen

doi:10.3389/feduc.2017.00049

Abstract

The intra-rater reliability in rating essays is usually indexed by the inter-rater correlation. We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations. The validity of the method is demonstrated by extensive simulations, and by applying it to an empirical dataset. It is recommended to use this estimation method whenever the emphasis is not on the average intra-reliability of a group of raters, but when the intra-rater reliability of a specific rater is of interest, e.g. when the error-variance component of the scores is of interest in order to estimate true scores.

Highlights

The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks
We suggest an alternative method for estimating intra-rater reliability, in the framework of classical test theory, by using the dis-attenuation formula for inter-test correlations
It applies in every situation that calls for human rating, be it in the context of K-12 writing, or in the context of openended questions, for which there are agreed-upon scoring rubrics

Summary

Introduction

The rating of essays written as a response to a given prompt is a complex cognitive task that encompasses many subtasks. The great diversity among raters, even after they have undergone a long training period; a diversity that is reflected in the final numerical ratings. Raters differ in their leniency/strictness, in their tendency to use (or not) the full range of the rating scale, and in the consistency in which they rate the essays (e.g., as captured by the Hierarchical Rater Model, Patz et al, 2002). Intra-rater reliability is estimated by having the rater read and evaluate each paper more than once This is seldom implemented, both because of its cost and because the two readings of the same essay by the same rater cannot be considered as genuinely independent. The purpose of this paper is to suggest a simple way to estimate intra-rater reliability and to test its adequacy using both simulated and real data

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Education	Publication Date: Sep 22, 2017
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Estimating the Intra-Rater Reliability of Essay Raters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Education

Lead the way for us

Similar Papers

Classical and modern measurement theories, patient reports, and clinical outcomes

Contemporary Clinical Trials | VOL. 31

01 Jan 2009
Contemporary Clinical Trials | VOL. 31

True scores, latent variables, and constructs: A comment on Schmidt and Hunter
Denny Borsboom ... Gideon J Mellenbergh
Intelligence | VOL. 30
Denny Borsboom, et. al.Denny Borsboom ... Gideon J Mellenbergh
14 Feb 2002
Intelligence | VOL. 30

Chapter 6 - Personality Measurement: Reliability and Validity Issues
Stephen G West ... John F. Finch
Handbook of Personality Psychology | VOL. -
Stephen G West, et. al.Stephen G West ... John F. Finch
01 Jan 1997
Handbook of Personality Psychology | VOL. -

Detection of heat and cold pain thresholds: an intra and inter rater reliability study
N Middlebrook ... D Falla
Physiotherapy | VOL. 105
N Middlebrook, et. al.N Middlebrook ... D Falla
01 Jan 2019
Physiotherapy | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimating the Intra-Rater Reliability of Essay Raters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Education