Abstract

Accurately predicting patient risk and identifying survival biomarkers are two important tasks in survival analysis. For the emerging high-throughput gene expression data, random survival forest (RSF) is attracting more and more attention as it not only shows excellent performance on survival prediction problems with high-dimensional variables, but also is capable of identifying important variables according to variable importance automatically calculated within the algorithm. However, RSF still suffers from some problems such as limited predictive accuracy on independent datasets and limited biological interpretation of survival biomarkers. In this study, we integrated gene interaction information into a Reweighted RSF model (RRSF) to improve predictive accuracy and identify biologically meaningful survival markers. We applied RRSF to the prediction of patients with glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC). With a reconstructed global pathway network and an mRNA-lncRNA co-expression network as the prior gene interaction information, RRSF showed better overall predictive performance than RSF on three GBM and two ESCC datasets. In addition, RRSF identified a two-gene and three-lncRNA signature, which showed robust prognostic values and had high biological relevance to the development of GBM and ESCC, respectively.

Highlights

  • Predicting the clinical outcome and response to treatment is a central challenge in clinical cancer research

  • To evaluate the proposed method, the Reweighted RSF (RRSF) model was applied to the survival prediction of two cancers, glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC), respectively

  • GBM-The Cancer Genome Atlas (TCGA)-test, GSE4412, and GSE4271 were used to evaluate the predictive performance of the RRSF model

Read more

Summary

Introduction

Predicting the clinical outcome and response to treatment is a central challenge in clinical cancer research. Due to the emergence of high-throughput gene expression data, RSF is attracting increased attention It has shown excellent performance on survival prediction problems with high-dimensional variables, and can cope with complex interaction structures as well as highly correlated variables[5]. RSF can rank variables according to their variable importance (VIMP), which reflects the ability to predict outcome and is automatically calculated within the RSF algorithm[4] These features are considered important advantages given the complexity of high-throughput gene expression data. Survival prediction models may be impaired if they use such genes as predictors To overcome these problems, researchers have proposed to integrate gene interaction information into prediction models. We propose a novel pipeline that integrates gene interaction information into a Reweighted RSF (RRSF) approach to improve predictive performance and select important genes associated with survival. We applied RRSF to patients with glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC) to evaluate its predictive performance and identify biomarkers

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.