Abstract

We derive a general structure that encompasses important coefficients of interrater agreement such as the S-coefficient, Cohen's kappa, Scott's pi, Fleiss' kappa, Krippendorff's alpha, and Gwet's AC1. We show that these coefficients share the same set of assumptions about rater behavior; they only differ in how the unobserved category proportions are estimated. We incorporate Bayesian estimates of the category proportions and propose a new agreement coefficient with uniform prior beliefs. To correct for guessing in the process of item classification, the new coefficient emphasizes equal category probabilities if the observed frequencies are unstable due to a small sample, and the frequencies increasingly shape the coefficient as they become more stable. The proposed coefficient coincides with the S-coefficient for the hypothetical case of zero items; it converges to Scott's pi, Fleiss' kappa, and Krippendorff's alpha as the number of items increases. We use simulation to show that the proposed coefficient is as good as extant coefficients if the category proportions are equal and that it performs better if the category proportions are substantially unequal. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call