A Study of Chance-Corrected Agreement Coefficients for the Measurement of Multi-Rater Consistency

Zheng Xie,Barry Cheetham,Chaitanya Gadepalli

doi:10.5013/ijssst.a.19.02.10

Abstract

Chance corrected agreement coefficients such as the Cohen and Fleiss Kappas are commonly used for the measurement of consistency in the decisions made by clinical observers or raters. However, the way that they estimate the probability of agreement (Pe) or cost of disagreement (De) 'by chance' has been strongly questioned, and alternatives have been proposed, such as the Aickin Alpha coefficient and the Gwet AC1 and AC2 coefficients. A well known paradox illustrates deficiencies of the Kappa coefficients which may be remedied by scaling Pe or De according to the uniformity of the scoring. The AC1 and AC2 coefficients result from the application of this scaling to the Brennan-Prediger coefficient which may be considered a simplified form of Kappa. This paper examines some commonly used multi-rater agreement coefficients including AC1 and AC2. It then proposes an alternative subject-by-subject scaling approach that may be applied to weighted and unweighted multi-rater Cohen and Fleiss Kappas and also Intra-Class Correlation (ICC) coefficients.

Full Text