Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Ulla B Mogensen,Thomas A Gerds,Hemant Ishwaran

doi:10.18637/jss.v050.i11

Abstract

Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.

Highlights

Prediction error curves are increasingly used to assess and compare predictions in survival analysis
We present the R (R Development Core Team 2009) package pec, short for prediction error curves, that is available from the Comprehensive R Archive Network at http://CRAN
By using repeated data splitting, this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011)

Summary

Introduction

In this article we concentrate on prediction error curves that are time dependent estimates of the population average Brier score. The package provides functions for IPCW estimation of the timedependent Brier score and has an option for selecting between ordinary cross-validation, leave-one-out bootstrap, and the .632+ bootstrap for estimating risk prediction performance. An important feature of pec is that the entire model building process can be taken into account in the evaluation of prediction error, including data dependent steps such as variable selection, shrinkage, or tuning parameter estimation. By using repeated data splitting (either cross-validation or bootstrap), this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011). We compare the Cox prediction model obtained in this fashion to random forest prediction models

Data structure

Random forests

Extracting predicted survival probabilities

Random survival forest package: rsf

Party package: cforest

Writing new extensions A predictSurvProb method has three required arguments:

Prediction error curves

Illustration

Reproducible results

Discussion

Choosing between cross-validation estimates

Estimation of weights

Model variability

Other packages

Alternative assessment measures

Further extensions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2012
Citations: 589	License type: cc-by

R Discovery Prime

R Discovery Prime

Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

КАЛИБРОВКА СКОРИНГОВОЙ МОДЕЛИС УЧЕТОМ ЦЕНЗУРИРОВАННЫХ ДАННЫХ
Margarita A Shirobokova
Вестник Пермского университета Серия «Экономика» = Perm University Herald ECONOMY | VOL. 14
Margarita A ShirobokovaMargarita A Shirobokova
01 Jan 2019
Вестник Пермского университета Серия «Экономика» = Perm University Herald ECONOMY | VOL. 14

Survival analysis in breast cancer: evaluating ensemble learning techniques for prediction.
Gonca Buyrukoğlu
PeerJ. Computer science | VOL. 10
Gonca BuyrukoğluGonca Buyrukoğlu
10 Jul 2024
PeerJ. Computer science | VOL. 10

Mortality prediction and influencing factors for intensive care unit patients with acute tubular necrosis: random survival forest and cox regression analysis
Jinping Zeng ... Yinyin Wu
Frontiers in Pharmacology | VOL. 15
Jinping Zeng, et. al.Jinping Zeng ... Yinyin Wu
23 May 2024
Frontiers in Pharmacology | VOL. 15

Clinical epidemiology and individualized medicine
Robin Henderson ... Martin Schumacher
Biometrical Journal | VOL. 53
Robin Henderson, et. al.Robin Henderson ... Martin Schumacher
11 Feb 2011
Biometrical Journal | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software