In-depth analysis of interreader agreement and accuracy in categorical assessment of brown adipose tissue in (18)FDG-PET/CT

Anton S Becker,Caroline Zellweger,Khoschy Schawkat,Sanja Bogdanovic,Valerie Doan Phi Van,Hannes W Nagel,Christian Wolfrum,Irene A Burger

doi:10.1016/j.ejrad.2017.03.012

Abstract

PurposeTo evaluate the interreader agreement of a three-tier craniocaudal grading system for brown fat activation and investigate the accuracy of the distinction between the three grades. Materials and methodsAfter IRB approval, 340 cases were retrospectively selected from patients undergoing (18)FDG-PET/CT between 2007 and 2015 at our institution, with 85 cases in each grade and 85 controls with no active brown fat. Three readers evaluated all cases independently. Furthermore standardized uptake values (SUV) measurements were performed by two readers in a subset of 53 cases. Agreement between the readers was assessed with Cohen's Kappa (k), the concordance correlation coefficient (CCC) and the intraclass correlation coefficient (ICC). Accuracy was assessed with Bland-Altman and receiver operating characteristics (ROC) analysis. A Bonferroni-corrected two-tailed p<0.016 was considered statistically significant. ResultsAgreement for BAT grade was excellent by all three metrics with k=0.83–0.89, CCC=0.83–0.89 and ICC=0.91–0.94. Bland-Altman analysis revealed only slight average over- or underestimation (−0.01−0.14) with the majority of disagreements within one grade. ROC analysis yielded slightly less accurate classification between higher vs. lower grades (Area under the ROC curves 0.78–0.84 vs. 0.88–0.92) but no significant differences between readers. Agreement was also excellent for the maximum SUV and the total brown fat volume (k=0.90 and 0.94, CCC=0.93 and 0.99, ICC=0.96 and 0.99), but Bland-Altman plots revealed a tendency to underestimate activity by one of the readers. ConclusionGrading the activation of brown fat by assessment of the most caudally activated depots results in excellent interreader agreement, comparable to SUV measurements.

Full Text