Confidence intervals of survival predictions with neural networks trained on molecular data

Elvire Roblin,Paul-Henry Cournède,Stefan Michiels

doi:10.1016/j.imu.2023.101426

Abstract

In medicine, an important objective is predicting patients’ survival based on their molecular and clinical characteristics. In this context, neural networks have recently been used for their ability to capture complex interactions in the data. Measuring the uncertainty associated with survival estimates obtained by neural networks is essential to enhance predictions’ reliability. We compared four methods adapted to multilayer perceptrons (MLPs) for building confidence intervals at the patient level. The methods were based either on bootstrap with Boot (Efron, 1979), ensembling with DeepEns (Lakshminarayanan et al., 2016), or Monte-Carlo Dropout with MCDrop and BMask (Gal and Ghahramani, 2016; Mancini et al., 2020). A comparison was made through MLP-based survival models: CoxCC and CoxTime (Kvamme et al., 2019) in a continuous time framework, DeepHit (Lee et al., 2018) and PLANN (Biganzoli et al., 1998) in a discrete time framework. We applied the methods to a simulation study, enabling us to estimate a coverage rate of the estimated confidence intervals. We also applied them to real-world datasets, and predicted the survival probability for patients with breast cancer and patients with lung cancer.In the simulation study, CoxCC and CoxTime obtained the mean C-indices numerically closest to those from the Oracle model (mean C-index of 0.723 for CoxCC, 0.726 for CoxTime, versus 0.743 for the Oracle model). Regarding the confidence intervals of survival probabilities, Boot with CoxCC obtained a coverage rate of 96.5%, the closest to the nominal value of 95%. MCDrop was slightly anticonservative and obtained a coverage rate of 89.8% with CoxTime. This method may represent a reasonable compromise in terms of coverage with regards of computational time. In the breast cancer cohort, MLPs had difficulty capturing additional prognostic information from the molecular data. In contrast, in the lung cancer cohort, the models led to substantially stronger discrimination values when adding molecular data to the clinical variables. In conclusion, we were able to represent uncertainty in the survival estimates at particular time points at the patient level using MLPs in the form of 95% confidence intervals. We recommend using CoxTime with either Boot or, for a less intensive computation time, MCDrop.

Full Text