Extranodal extension (pENE) is a critical prognostic factor in oropharyngeal cancer (OPC) that drives therapeutic disposition. Determination of pENE from radiological imaging has been associated with high inter-observer variability. However, the impact of clinician specialty on human observer performance of imaging-detected extranodal extension (iENE) remains poorly understood. To characterize the impact of clinician specialty on the accuracy of pre-operative iENE in human papillomavirus-positive (HPV+) OPC using computed tomography (CT) images. This prospective observational human performance study analyzed pre-therapy CT images from 24 HPV+ OPC patients, with duplication of 6 scans (n=30) of which 21 were pathologically confirmed pENE. Thirty-four expert observers, including 11 radiologists, 12 surgeons, and 11 radiation oncologists, independently assessed these scans for iENE and reported human-detected radiologic criteria and observer confidence. The primary outcomes included accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and Brier score for each physician, compared to ground-truth pENE. The significance of radiographic signs for prediction of pENE were determined through logistic regression analysis. Fleiss' kappa measured interobserver agreement, and Hanley-MacNeil AUC discrimination testing. Median accuracy across all specialties was 0.57 (95%CI 0.39 to 0.73), with no specialty showing discriminate performance greater than random estimation (median AUC 0.64, 95%CI 0.44 to 0.83). Significant differences between radiologists and surgeons in Brier scores (0.33 vs. 0.26, p < 0.01), radiation oncologists and surgeons in sensitivity (0.48 vs. 0.69, p > 0.1), and radiation oncologists and radiologists/surgeons in specificity (0.89 vs. 0.56, p > 0.1). Indistinct capsular contour and nodal necrosis were significant predictors of correct pENE status among all specialties. Interobserver agreement was weak for all the radiographic criteria, regardless of specialty (κ<0.6). Multiobserver testing shows physician discrimination of HPV+OPC pENE on pre-operative CT remains non-different than blind guessing, with high interrater variability and low diagnostic accuracy, regardless of clinician specialty. While minor differences in diagnostic performance among specialties are noted, they do not significantly affect the overall poor agreement and discrimination rates observed. The findings underscore the need for further research into automated detection systems or enhanced imaging techniques to improve the accuracy and reliability of iENE assessments in clinical practice.
Read full abstract