Abstract

ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call