Abstract
BackgroundCancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB).ResultsWhen analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data.ConclusionThe results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.
Highlights
Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care
We analyzed three combinations of the subtype cohorts: colon and rectum adenocarcinoma (COADREAD), which is the combination of the colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) datasets, brain lower grade glioma and glioblastoma multiforme (GBMLGG), which is the combination of the brain lower grade glioma (LGG) and glioblastoma multiforme (GBM) datasets and lung cancer (LUNG), which is the combination of the lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) datasets
Similar to the simulation studies, predictive performance on the real survival data was quantified by the concordance index (CI), which was averaged over 50 replications of 5fold nested cross validation
Summary
Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expressionbased predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. With the advent of high-throughput profiling technologies, there exists a new challenge of extracting information from a huge number of expressed genes and proteins. One approach to this challenge has been to group genes by biological functions into smaller sets of pathways, a process that is called pathway analysis or gene set testing [11]. Pathway-level variables are more readily interpreted since they represent biologically meaningful groups of genes, e.g., the genes involved in a specific signaling pathway or the genes whose expression is upregulated in response to a specific chemical perturbation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.