Abstract Recently, there has been significant interest in using rich, multi-modal data routinely collected in the clinic, including imaging, demographic and clinical information for prognostic factor discovery in cancer. In particular, the emerging field of radiomics makes use of computational tools to extract quantitative features from radiological images, with the aim of capturing the morphological and biological characteristics of tumors. Previous studies have demonstrated the potential of computed tomography (CT) imaging features as independent prognostic factors for overall survival in multiple types of cancer, including head and neck (HNC). However, poor reproducibility and lack of large, rigorous validation studies have hindered widespread clinical use of radiomics so far. We conducted a HNC survival prediction challenge with the aim of 1) developing an accurate prognostic model for HNC survival using clinical, demographic and routinely collected CT imaging data and 2) evaluating the true added value of CT radiomics compared to other prognostic factors. Using a large, retrospective cohort of 2552 patients, we assessed prognostic performance of 12 different approaches developed by several research groups at University Health Network in Toronto, making use of engineered radiomics, deep learning, clinical information and combinations of those. To allow for unbiased comparison between different approaches, all participants had access to a public training dataset of 1802 patients, while 750 were held out for evaluation. The best challenge submission used a deep multi-task learning framework on clinical data and tumour volume, achieving area under the ROC curve (AUROC) of 0.812 [95% CI 0.763–0.858] for 2-year survival prediction and concordance (C) index for lifetime risk prediction of 0.795 [0.751–0.838], outperforming the best clinical-only model (AUROC=0.800 [0.749–0.848], C=0.708 [0.661–0.754]), best radiomics-only model (AUROC=0.766 [0.719–0.811], C=0.748 [0.704–0.790]), as well as the best model combining deep radiomics with clinical features (AUROC=0.786 [0.733–0.836], C=0.774 [0.726–0.820]). We also used a ‘wisdom of the crowds’ ensemble approach to combine the predictions of all challenge submissions to determine whether The ensemble achieved stronger performance than any individual model (AUROC=0.823 [0.778–0.864], C=0.810 [0.772–0.845]) indicating that there might be complementary information between the different data modalities. Our rigorous challenge framework allowed us to evaluate a diverse collection of prognostic models in a large multi-modal dataset, demonstrating the value of machine learning in HNC prognostication, as well as the advantages of simple imaging features over several hand-engineered and deep radiomics approaches. Furthemore, our ensemble approach achieves excellent performance for both 2-year and lifetime risk prediction, establishing new state-of-the-art in HNC prognostic modelling. Citation Format: Michal Kazmierski, Mattea Welch, Benjamin Haibe-Kains. Radiomics for head and neck cancer prognostication: results from the RADCURE machine learning challenge [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-030.
Read full abstract