Abstract

Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.

Highlights

  • Prediction error curves are increasingly used to assess and compare predictions in survival analysis

  • We present the R (R Development Core Team 2009) package pec, short for prediction error curves, that is available from the Comprehensive R Archive Network at http://CRAN

  • By using repeated data splitting, this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011)

Read more

Summary

Introduction

In this article we concentrate on prediction error curves that are time dependent estimates of the population average Brier score. The package provides functions for IPCW estimation of the timedependent Brier score and has an option for selecting between ordinary cross-validation, leave-one-out bootstrap, and the .632+ bootstrap for estimating risk prediction performance. An important feature of pec is that the entire model building process can be taken into account in the evaluation of prediction error, including data dependent steps such as variable selection, shrinkage, or tuning parameter estimation. By using repeated data splitting (either cross-validation or bootstrap), this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011). We compare the Cox prediction model obtained in this fashion to random forest prediction models

Data structure
Random forests
Extracting predicted survival probabilities
Random survival forest package: rsf
Party package: cforest
Writing new extensions A predictSurvProb method has three required arguments:
Prediction error curves
Illustration
Reproducible results
Discussion
Choosing between cross-validation estimates
Estimation of weights
Model variability
Other packages
Alternative assessment measures
Further extensions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call