Recently there has been an increase demand for Computer-Aided Diagnosis (CAD) tools to support clinicians in the field of Indirect ImmunoFluorescence (IIF), as the novel digital imaging reading approach can help to overcome the reader subjectivity. Nevertheless, a large multicenter evaluation of the inter-observer reading variability in this field is still missing. This work fills this gap as we evaluated 556 consecutive samples, for a total of 1679 images, collected in three laboratories with IIF expertise using HEp-2 cell substrate (MBL) at 1:80 screening dilution according to conventional procedures. In each laboratory, the images were blindly classified by two experts into three intensity classes: positive, negative, and weak positive. Positive and weak positive ANA-IIF results were categorized by the predominant fluorescence pattern among six main classes. Data were pairwise analyzed and the inter-observer reading variability was measured by Cohen's kappa test, revealing a pairwise agreement little further away than substantial both for fluorescence intensity and for staining pattern recognition (k=0.602 and k=0.627, respectively). We also noticed that the inter-observer reading variability decreases when it is measured with respect to a gold standard classification computed on the basis of labels assigned by the three laboratories. These data show that laboratory agreement improves using digital images and comparing each single human evaluation to potential reference data, suggesting that a solid gold standard is essential to properly make use of CAD systems in routine work lab.
Read full abstract