Background and purposeSurvival is frequently assessed using Cox proportional hazards (CPH) regression; however, CPH may be too simplistic as it assumes a linear relationship between covariables and the outcome. Alternative, non-linear machine learning (ML)-based approaches, such as random survival forests (RSFs) and, more recently, deep learning (DL) have been proposed; however, these techniques are largely black-box in nature, limiting explainability. We compared CPH, RSF and DL to predict overall survival (OS) of non-small cell lung cancer (NSCLC) patients receiving radiotherapy using pre-treatment covariables. We employed explainable techniques to provide insights into the contribution of each covariable on OS prediction. Materials and methodsThe dataset contained 471 stage I-IV NSCLC patients treated with radiotherapy. We built CPH, RSF and DL OS prediction models using several baseline covariable combinations. 10-fold Monte-Carlo cross-validation was employed with a split of 70%:10%:20% for training, validation and testing, respectively. We primarily evaluated performance using the concordance index (C-index) and integrated Brier score (IBS). Local interpretable model-agnostic explanation (LIME) values, adapted for use in survival analysis, were computed for each model. ResultsThe DL method exhibited a significantly improved C-index of 0.670 compared to the CPH and a significantly improved IBS of 0.121 compared to the CPH and RSF approaches. LIME values suggested that, for the DL method, the three most important covariables in OS prediction were stage, administration of chemotherapy and oesophageal mean radiation dose. ConclusionWe show that, using pre-treatment covariables, a DL approach demonstrates superior performance over CPH and RSF for OS prediction and use explainable techniques to provide transparency and interpretability.
Read full abstract