Abstract

Quantitative high-throughput data deposited in consortia such as International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) present opportunities and challenges for computational analyses. We present a computational strategy to systematically rank and investigate a large number (210-220) of clinically testable gene sets, using combinatorial gene subset generation and disease-free survival (DFS) analyses. This approach integrates protein-protein interaction networks, gene expression, DNA methylation, and copy number data, in association with DFS profiles from patient clinical records. As a case study, we applied this pipeline to systematically analyze the role of ALDH1A2 in prostate cancer (PCa). We have previously found this gene to have multiple roles in disease and homeostasis, and here we investigate the role of the associated ALDH1A2 gene/protein networks in PCa, using our methodology in combination with PCa patient clinical profiles from ICGC and TCGA databases. Relationships between gene signatures and relapse were analyzed using Kaplan-Meier (KM) log-rank analysis and multivariable Cox regression. Relative expression versus pooled mean from diploid population was used for z-statistics calculation. Gene/protein interaction network analyses generated 11 core genes associated with ALDH1A2; combinatorial ranking of the power set of these core genes identified two gene sets (out of 211 - 1 = 2,047 combinations) with significant correlation with disease relapse (KM log rank p < 0.05). For the more significant of these two sets, referred to as the optimal gene set (OGS), patients have median survival 62.7 months with OGS alterations compared to >150 months without OGS alterations (p = 0.0248, hazard ratio = 2.213, 95% confidence interval = 1.1-4.098). Two genes comprising OGS (CYP26A1 and RDH10) are strongly associated with ALDH1A2 in the retinoic acid (RA) pathways, suggesting a major role of RA signaling in early PCa progression. Our pipeline complements human expertise in the search for prognostic biomarkers in large-scale datasets.

Highlights

  • Large volumes of cancer genomic data are being continuously generated via consortia such as The Cancer Genome Atlas (TCGA) [1] and the International Cancer Genome Consortium (ICGC) [2], and optimal use of this data promises improvement to patient care [3]

  • ALDH1A2 is a key player in the retinoic acid (RA) pathway and retinoid metabolism, both known to be important in homeostasis and cellular function [32, 33], the disruption of which leads to various health problems including prostate cancer (PCa) [34, 35]

  • As the number of large-scale genomics datasets exponentially increases due to decreasing experimental costs, current limitations reside in our capacity to extract relevant information

Read more

Summary

Introduction

Large volumes of cancer genomic data are being continuously generated via consortia such as The Cancer Genome Atlas (TCGA) [1] and the International Cancer Genome Consortium (ICGC) [2], and optimal use of this data promises improvement to patient care [3]. Recent studies of different cancer patient cohorts have incorporated some machine learning techniques such as decision trees [7] and Bayesian belief networks [8, 9]. These techniques are computationally intensive, frequently rely on heuristics to explore the gene-set space, and commonly suffer from small-sized patient cohorts [10]. Quantitative high-throughput data deposited in consortia such as International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) present opportunities and challenges for computational analyses

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.