In many imaging studies, each case is reviewed by human readers and characterized according to one or more features. Often, the inter-reader agreement of the feature indications is of interest in addition to their diagnostic accuracy or association with clinical outcomes. Complete designs in which all participating readers review all cases maximize efficiency and guarantee estimability of agreement metrics for all pairs of readers but often involve a heavy reading burden. Assigning readers to cases using balanced incomplete block designs substantially reduces reading burden by having each reader review only a subset of cases, while still maintaining estimability of inter-reader agreement for all pairs of readers. Methodology for data analysis and power and sample size calculations under balanced incomplete block designs is presented and applied to simulation studies and an actual example. Simulation studies results suggest that such designs may reduce reading burdens by >40% while in most scenarios incurring a <20% increase in the standard errors and a <8% and <20% reduction in power to detect between-modality differences in diagnostic accuracy and statistics, respectively.
Read full abstract