The challenge of survival prediction is ubiquitous in medicine, but only a handful of methods are available for survival prediction based on time-varying data. Here we propose a novel method for this problem, using a random forest of survival trees for left-truncated and right-censored data. We demonstrate the advantage of our method on prediction of breast cancer and prostate gland cancer risk among healthy individuals by analyzing routine laboratory measurements, vital signs and age. We analyze electronic medical records of 20,317 healthy individuals who underwent routine checkups and identified those who later developed cancer. In cross-validation, our method predicted future prostate and breast cancers six months before diagnosis with an area under the ROC curve of 0.62±0.05 and 0.6±0.03 respectively, outperforming standard random forest, random survival forest, cox-regression model, dynamic deep-hit and a single survival tree. Our work proposes a new framework for survival risk prediction in time-varying data and our results suggest that computational analysis of data on healthy individuals can improve the detection of those at risk of future cancer development.
Read full abstract