Abstract
Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.
Highlights
Prediction error curves are increasingly used to assess and compare predictions in survival analysis
We present the R (R Development Core Team 2009) package pec, short for prediction error curves, that is available from the Comprehensive R Archive Network at http://CRAN
By using repeated data splitting, this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011)
Summary
In this article we concentrate on prediction error curves that are time dependent estimates of the population average Brier score. The package provides functions for IPCW estimation of the timedependent Brier score and has an option for selecting between ordinary cross-validation, leave-one-out bootstrap, and the .632+ bootstrap for estimating risk prediction performance. An important feature of pec is that the entire model building process can be taken into account in the evaluation of prediction error, including data dependent steps such as variable selection, shrinkage, or tuning parameter estimation. By using repeated data splitting (either cross-validation or bootstrap), this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and van de Wiel 2011). We compare the Cox prediction model obtained in this fashion to random forest prediction models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.