Oropharyngeal Squamous Cell Carcinoma (OPSCC) is one of the common forms of heterogeneity in head and neck cancer. Infection with human papillomavirus (HPV) has been identified as a major risk factor for OPSCC. Therefore, differentiating the HPV-positive and negative cases in OPSCC patients is an essential diagnostic factor influencing future treatment decisions. In this study, we investigated the accuracy of a deep learning-based method for image interpretation and automatically detected the HPV status of OPSCC in routinely acquired Computed Tomography (CT) and Positron Emission Tomography (PET) images. We introduce a 3D CNN-based multi-modal feature fusion architecture for HPV status prediction in primary tumor lesions. The architecture is composed of an ensemble of CNN networks and merges image features in a softmax classification layer. The pipeline separately learns the intensity, contrast variation, shape, texture heterogeneity, and metabolic assessment from CT and PET tumor volume regions and fuses those multi-modal features for final HPV status classification. The precision, recall, and AUC scores of the proposed method are computed, and the results are compared with other existing models. The experimental results demonstrate that the multi-modal ensemble model with soft voting outperformed single-modality PET/CT, with an AUC of 0.76 and F1 score of 0.746 on publicly available TCGA and MAASTRO datasets. In the MAASTRO dataset, our model achieved an AUC score of 0.74 over primary tumor volumes of interest (VOIs). In the future, more extensive cohort validation may suffice for better diagnostic accuracy and provide preliminary assessment before the biopsy.
Read full abstract