Abstract

SummaryPredicting cancer survival from molecular data is an important aspect of biomedical research because it allows quantifying patient risks and thus individualizing therapy. We introduce XGBoost tree ensemble learning to predict survival from transcriptome data of 8,024 patients from 25 different cancer types and show highly competitive performance with state-of-the-art methods. To further improve plausibility of the machine learning approach we conducted two additional steps. In the first step, we applied pan-cancer training and showed that it substantially improves prognosis compared with cancer subtype-specific training. In the second step, we applied network propagation and inferred a pan-cancer survival network consisting of 103 genes. This network highlights cross-cohort features and is predictive for the tumor microenvironment and immune status of the patients. Our work demonstrates that pan-cancer learning combined with network propagation generalizes over multiple cancer types and identifies biologically plausible features that can serve as biomarkers for monitoring cancer survival.

Highlights

  • Patient survival is the ultimate goal of cancer therapy and predicting patient survival from the molecular features of the individual tumor is an important computational task that has implications for tumor progression, therapy, and patient care (Hoadley et al, 2018)

  • Pan-cancer approaches have been conducted (The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, 2020), and these studies have shown that there are common regulatory mechanisms and features that are shared by patients of different cancer sub-types, for example, on the level of signaling pathways (Sanchez-Vega et al, 2018)

  • Our study shows that gradient tree boosting can be efficiently applied to pan-cancer survival prediction and that the combination of machine learning and network propagation can identify biologically meaningful subnetworks that highlight the importance of the tumor microenvironment (TME) for patient survival

Read more

Summary

Introduction

Patient survival is the ultimate goal of cancer therapy and predicting patient survival from the molecular features of the individual tumor is an important computational task that has implications for tumor progression, therapy, and patient care (Hoadley et al, 2018). Large population studies have shown that cancer survival is a multi-factorial problem and varies broadly between cancer types (Allemani et al, 2015, 2018) This has given rise to the identification of numerous gene expression signatures specific for given cancer subtypes or treatments; such signatures are often hardly reproducible (Venet et al, 2011) and depend on the statistical approach, the individual patient cohorts used, and even the normalization of the data (Patil et al, 2015). In addition to gene expression signatures, it has recently been reported for mutation-based signatures that there is a lack of biological cause and interpretation (Alexandrov et al, 2020; Kim et al, 2020) This is due to rather small sample sizes of subtype-specific patient cohorts and high inter-patient heterogeneity of the molecular features of the patients within the cancer subtypes. This suggests that there might be common patterns for survival prognosis across different cancer types and only recently survival prognoses based on pan-cancer approaches have been developed (Cheerla and Gevaert, 2019; Kim et al, 2020)

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.