Abstract

ABSTRACT This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of proficiency estimation of two IRT models (GPCM versus the hierarchical rater model, HRM) for double ratings. The main findings were as follows: (a) rater effects substantially reduced the accuracy of IRT proficiency estimation; (b) double ratings relieved the negative impact of rater effects on proficiency estimation and improved the accuracy relative to single ratings; (c) IRT estimators showed different patterns in the conditional accuracy; (d) as more items and a larger number of score categories were used, the accuracy of proficiency estimation improved; and (e) the HRM consistently showed better performance than the GPCM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call