Abstract

High-dimensional variable selection in the proportional hazards (PH) model has many successful applications in different areas. In practice, data may involve confounding variables that do not satisfy the PH assumption, in which case the stratified proportional hazards (SPH) model can be adopted to control the confounding effects by stratification without directly modeling the confounding effects. However, there is a lack of computationally efficient statistical software for high-dimensional variable selection in the SPH model. In this work an R package, SurvBoost, is developed to implement the gradient boosting algorithm for fitting the SPH model with high-dimensional covariate variables. Simulation studies demonstrate that in many scenarios SurvBoost can achieve better selection accuracy and reduce computational time substantially compared to the existing R package that implements boosting algorithms without stratification. The proposed R package is also illustrated by an analysis of gene expression data with survival outcome in The Cancer Genome Atlas study. In addition, a detailed hands-on tutorial for SurvBoost is provided.

Highlights

  • Variable selection for high-dimensional survival data has become increasingly important in a variety of research areas

  • The stratified proportional hazards (SPH) model has a wide range of applications for survival analysis, but no computationally efficient statistical software are available for high-dimensional variable selection in the SPH model

  • We introduce a new R package SurvBoost which implements the gradient boosting algorithm for high-dimensional variable selection in the stratified proportional hazards (SPH) model, while most existing R packages, such as mboost only focus on the proportional hazards model

Read more

Summary

Introduction

Variable selection for high-dimensional survival data has become increasingly important in a variety of research areas. The R package mboost has been developed and become a powerful tool for variable selection and parameter estimation in complex parametric and nonparametric models via the boosting methods (Hothorn et al, 2017) It has been widely used in many applications. The SPH model has a wide range of applications for survival analysis, but no computationally efficient statistical software are available for high-dimensional variable selection in the SPH model To fill this gap, we develop an R package, SurvBoost, to implement the gradient boosting algorithm for fitting the SPH model with high-dimensional covariates with adjusting confounding variables. Additional options are available in the SurvBoost package to determine an appropriate stopping criteria for the algorithm Another useful function assists in selecting stratification variables, which may improve model fitting results. ISSN 2073-4859 data with survival outcome in The Cancer Genome Atlas (TCGA) study

Methods
Method
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call