Successful Identification of Head and Neck Cancer (HNC) Nodal Metastasis (NM) and Extranodal Extension (ENE) Using Deep Learning Neural Networks

B.H Kann,S Aneja,G Loganadane V,J.R Kelly,S.M Smith,R.H Decker,W Yarbrough,A Malhotra,B Burtness,Z.A Husain

doi:10.1016/j.ijrobp.2018.06.169

Abstract

Despite modern imaging modalities, radiologic detection of clinically meaningful tumor characteristics, such as NM and, particularly tumor ENE remains difficult, with reported areas under the receiver operating characteristic curve (AUC) <.70 and accuracies <70%. Emerging machine learning techniques, including deep learning, have had success with computer vision analysis. We hypothesized that computer vision with deep convolutional neural networks (DCNNs) could achieve favorable performance in identifying NM and ENE on pretreatment HNC imaging. We identified and segmented 653 lymph nodes (LN) on preoperative, diagnostic, contrast-enhanced, HNC CT scans from 270 patients with HN squamous cell carcinoma who subsequently underwent LN dissection at a single institution from 2013 - 2017. Radiographic segmentations were correlated with pathology reports and labeled as “negative”, “NM”, or “ENE.” A centralized re-review of the pathology reports documenting ENE was conducted to confirm ENE presence. A 3-dimensional DCNN was constructed using multilayer convolutional neural networks with input features consisting of normalized Hounsfield units corresponding to CT voxels within and surrounding the segmented LN. Shuffled and stratified training (64%), validation (16%), and blinded test (20%) sets were isolated a priori. The DCNN was trained using oversampling, data augmentation, dropout, regularization, and cross-validation. Performance on the blinded test set was evaluated using AUC, accuracy, sensitivity, and specificity. ENE detection performance was tested on LN samples with short axis diameter (SAD) ≥1 cm. A separate multivariable logistic regression model with bootstrapping was constructed as a benchmark comparison, using clinical characteristics and LN SAD. There were 380 negative LNs, 153 NM without ENE, and 120 NM with ENE identified and segmented. On logistic regression, clinical N-stage and greater SAD, were predictive of ENE (each P<.001) with a negative interaction between HPV-status and SAD (P<.05). Primary site (oropharynx) and SAD were predictive of NM (each P <.001). The DCNN had higher test-set performance than the benchmark logistic model for ENE (AUC: .91 vs .81, accuracy: 85.7% vs 77.7%) and NM prediction, and both models outperformed historical controls for ENE prediction (Table 1). Deep neural networks identify HNC NM and ENE with superior predictive performance than traditional logistic models and human-operator historical controls and show promise as clinician decision-making tools. External validation and prospective testing is being planned to determine the generalizability of these algorithms.Abstract 118; Table 1ENENMDCNNLogisticDCNNLogisticAUC.91.81.91.86Accuracy (%)85.777.785.576.1Sensitivity.88.72.84.79Specificity.85.80.87.74 Open table in a new tab

Full Text