Evaluating Rater Accuracy in Performance Assessments

George Engelhard

doi:10.1111/j.1745-3984.1996.tb00479.x

Abstract

A new method for evaluating rater accuracy within the context of performance assessments is described. Accuracy is defined as the match between ratings obtained from operational raters and those obtained from an expert panel on a set of benchmark, exemplar, or anchor performances. An extended Rasch measurement model called the FACETS model is presented for examining rater accuracy. The FACETS model is illustrated with 373 benchmark papers rated by 20 operational raters and an expert panel. The data are from the 1993field test of the High School Graduation Writing Test in Georgia. The data suggest that there are statistically significant differences in rater accuracy; the data also suggest that it is easier to be accurate on some benchmark papers than on others. A small example is presented to illustrate how the accuracy ordering of raters may not be invariant over different subsets of benchmarks used to evaluate accuracy.

Full Text