Quantifying rater variation for ordinal data using a rating scale model.

Shiqi Zhang,Jørgen Holm Petersen

doi:10.1002/sim.7639

Abstract

We present a model-based approach to the analysis of agreement between different raters in a situation where all raters have supplied ordinal ratings of the same cases in a sample. It is assumed that no "gold standard" is available. The model is an ordinal regression model with random effects-a so-called rating scale model. The model includes case-specific parameters that allow each case his or hers own level (disease severity). It also allows raters to have different propensities to score a given set of individuals more or less positively-the rater level. Based on the model, we suggest quantifying the rater variation using the median odds ratio. This allows expressing the variation on the same scale as the observed ordinal data. An important example that will serve to motivate and illustrate the proposed model is the study of breast cancer diagnosis based on screening mammograms. The purpose of the assessment is to detect early breast cancer in order to obtain improved cancer survival. In the study, mammograms from 148 women were evaluated by 110 expert radiologists. The experts were asked to rate each mammogram on a 5-point scale ranging from "normal" to "probably malignant."

Full Text