Modelling patterns of agreement for nominal scales

Chris Roberts

doi:10.1002/sim.2945

Abstract

The measurement of agreement of repeat rating is the usual method of assessing the reliability of categorical scales. Measurement of agreement is also important in genetic twin studies based on categorical scales. One of the most commonly used methods of analysis for both types of study is the kappa coefficient. For scales with more than two categories, one approach is to use a single summary kappa coefficient. While this may be sufficient for many studies, in some instances investigation of heterogeneity in the pattern of agreement may give additional insights as there may be greater agreement for some pairs of categories than for others. In this paper, kappa-type coefficients are used to model heterogeneity in the pattern of agreement. Constraints are added to the heterogeneous model to obtain simplified models. Procedures for estimation, confidence intervals, and inference for these coefficients are described for the case of two ratings per subject for a single sample and for the comparison of two independent samples. Formulae for sample size and power calculation are derived using the non-central chi-squared distribution. Two simulation studies are carried out to check the empirical test size and power. Methods are illustrated by two examples involving nominal scales with three categories.

Full Text