Abstract

BackgroundGenomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches.ResultsWe evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue.ConclusionOur comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort.

Highlights

  • Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival

  • Somatic alterations can be classified into two types: somatic point mutations (SPM), which include single nucleotide variants and indels which only affect one or a few genetic code letters, and somatic copy number variations (CNV), which involve larger contiguous portions of the genome either being lost or duplicated [7]

  • The Lower Grade Glioma (LGG) cohort performed remarkably well for all models, especially for the gene-level SPM models. While for cohorts such as UVM and KIRP, the SPM-only models have close to null predictive power using either gene-level or pathway-level predictors

Read more

Summary

Introduction

Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Advances in high-throughput technologies have helped to identify and characterize the genomic landscape of human cancers Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have characterized gene expression, mutation, copy number, miRNA, and methylation features from over 20,000 primary cancers and adjacent normal samples spanning 33 cancer types [1]. For CNV a similar measure is copy number alteration burden which indicates the degree to which a tumor’s genome is altered as a percentage of genome length [7] Both of these measures are sample-wise measurements that give an overall score to each sample, both of them discard specific gene information. To more fully characterize alterations that jointly affect prognosis we propose using gene set enrichment methods to aggregate the information to the pathway-level so that a score is given for each pathway and each sample

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call