Abstract

Random survival forests (RSF) are a powerful nonparametric method for building prediction models with a time-to-event outcome. RSF do not rely on the proportional hazards assumption and can be readily applied to both low- and higher-dimensional data. A remaining limitation of RSF, however, arises from the fact that the method is almost entirely focussed on continuously measured event times. This issue may become problematic in studies where time is measured on a discrete scale t = 1, 2, ..., referring to time intervals [0,a_1), [a_1,a_2), ldots . In this situation, the application of methods designed for continuous time-to-event data may lead to biased estimators and inaccurate predictions if discreteness is ignored. To address this issue, we develop a RSF algorithm that is specifically designed for the analysis of (possibly right-censored) discrete event times. The algorithm is based on an ensemble of discrete-time survival trees that operate on transformed versions of the original time-to-event data using tree methods for binary classification. As the outcome variable in these trees is typically highly imbalanced, our algorithm implements a node splitting strategy based on Hellinger’s distance, which is a skew-insensitive alternative to classical split criteria such as the Gini impurity. The new algorithm thus provides flexible nonparametric predictions of individual-specific discrete hazard and survival functions. Our numerical results suggest that node splitting by Hellinger’s distance improves predictive performance when compared to the Gini impurity. Furthermore, discrete-time RSF improve prediction accuracy when compared to RSF approaches treating discrete event times as continuous in situations where the number of time intervals is small.

Highlights

  • Random survival forests (RSF, Ishwaran et al 2008) have become an established tool to model right-censored data in observational research

  • (iii) Unlike Schmid et al (2016a), who proposed to fit discrete-time survival trees using cardinality pruning in combination with the Gini impurity split criterion, we propose to build discrete-time RSF using unpruned trees with a small minimum node size

  • It is seen that the Hellinger’s distance (HD) method resulted in smaller prediction errors than the Gini impurity (GI) method at all time points

Read more

Summary

Introduction

Random survival forests (RSF, Ishwaran et al 2008) have become an established tool to model right-censored data in observational research. A remaining limitation of the RSF methodology arises from the fact that RSF are almost entirely focussed on continuously measured event times This limitation is relevant in observational studies with fixed follow-up intervals where it is only known that events have occurred between two consecutive points in time. In these cases, event times are grouped (constituting a special case of interval censoring), and time is measured on a discrete scale t = 1, 2, . As argued by many authors (e.g. Tutz and Schmid 2016; Bogaerts et al 2017; Berger et al 2018), the application of statistical models designed for continuous time-toevent data is not appropriate when interval censoring and/or grouping effects are ignored

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call