Abstract B1-04: Co-expression profiling and transcriptional network evidences from RNA-Seq data reveal specific molecular subtype features in breast cancer

Biju Issac,Enrico Capobianco,Nicholas F Tsinoremas

doi:10.1158/1538-7445.compsysbio-b1-04

Abstract

Abstract Expression profiling is regarded as the gold standard for breast cancer subtypes, but the recent advent of integrative multi-omics is challenging the validity of findings based on clustering approaches, the stability of the identified groups, the overall reproducibility of the molecular subtypes associated to clusters, and the impacts in both diagnostic practice and therapy. The purpose of the study is to elucidate the relevance of method integration, moving from the limitations of co-expression dynamics revealed by clustering to the high potential of transcriptional network analysis aimed to merge gene expression signatures with oncogenic pathway activity, and better define distinct disease features, such as molecular subtypes. 106 solid normal and 124 solid tumor breast cancer paired-end RNA-Seq data were obtained from The Cancer Genome Atlas (TCGA). Tissue type for all samples is infiltrating ductal carcinoma from female patients. Median age of patients is 57 yrs for normal samples and 52.5 yrs for tumor samples. All samples were sequenced on Illumina Hi-Seq 2000 and each sample contains on average ~100 million reads. Only reads with mapping quality (MAQ) score &gt;=20 were used and mapping rate for these reads was above 90% against repeat masked human transcriptome (build hg19). Fragment Per Kilobase per Million reads (FPKM) values were computed using TopHat and Cufflinks software. Comparison between normal and tumor samples generated 2344 significantly differentially expressed genes (DEGs; p-values≤0.05; fold-change≥2). The data cascade delivered several expression biotype categories, following ENSEMBL classification. Our focus was initially directed to the identification of molecular subtypes associated to clusters. The clusters were treated as seed classes informing on the target phenotypes, but needing refinement. We explored typical associations formed through hierarchical clustering, tested the robustness of the approach, and assessed the accuracy of sample assignment to clusters. Then, we analyzed transcriptional modularity and by constraining the search space of the problem we identified a list of candidate modules to reflect molecular subtype signatures. We annotated such modules and found distinctive features at both functional enrichment and pathway levels by using several software tools and visualization framework (ClueGO, EnrichmentMap, Cytoscape, etc). We demonstrate that limitations of hierarchical clustering methods for molecular subtyping can in part be bypassed by complementing the expression profiles with transcriptional modules that allow for multiple annotation and useful visualizations. Therefore, it is important to stress a general aspect - that integrative analysis is a key factor relying not just on increasing data dimensionality (i.e., multi-omics into play), but also relying on a refinement of the data cross-fertilization and analysis fusion performed at intra-omic scale, before dealing with inter-omics harmonization. Citation Format: Biju Issac, Nicholas F. Tsinoremas, Enrico Capobianco. Co-expression profiling and transcriptional network evidences from RNA-Seq data reveal specific molecular subtype features in breast cancer. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B1-04.

Full Text