Abstract

Abstract Breast cancer subtyping using gene expression is well established in breast cancer research and gaining traction in the clinical setting. While it is known that there are large chromosomal regions affected by copy number polymorphisms, histone modifications, and other spanning alterations, it is not clear whether expression patterns regulate such regional changes. We present a method to integrate any type of expression data - here, we analyze mRNA, lincRNA, and mRNA and lincRNA together - and quantify long-range expression patterns affecting large regions of the genome. TCGA alignment and gene expression RNA-Seq data for breast cancer were generated at the Carolina Center for Genome Sciences, UNC at Chapel Hill. We examined 715 samples which each had at least partial data for ER/PR/HER2 status and complete data for PAM50 subtype assignment. Our method defines long-range expression within a window of a particular length (e.g. 100 Kb, 1 Mb). We take the mean weighted expression values for all genes that fall within each window and concatenate these windows to obtain larger chromosome-wide patterns. The final chromosome-wide vectors are joined to represent long-range expression patterns across the whole genome. We retain the top 10% most varying windows. Then, we apply hierarchical clustering, perform survival analysis, and evaluate enrichment of clinically meaningful subtypes using hypergeometric test. Hierarchical clustering across each analysis revealed clear separation of all PAM50-classified breast cancer subtypes at 1 Megabase resolution in the available data set. Interestingly, clustering of samples (n = 715) using 247 bins revealed distinct subgroups at each level of analysis - mRNA, lincRNA, and mRNA plus lincRNA. At these levels, three clusters contained significant enrichment for Her2-amplified (mRNA, p=1.5E-35; lincRNA, p=1.8E-26; mRNA + lincRNA, p=1.4E-33), Normal-like (mRNA, p=8.9E-82; lincRNA, p=1E-71; mRNA + lincRNA, p=1.6E-77), and Basal-like (mRNA, p=9.2E-67; lincRNA, p=6.9E-93; mRNA + lincRNA, p=8.2E-72) breast cancer. In view of the association of these mRNA clusters with PAM50 classifications, it is surprising that less than 10% of the genes in the analysis were overlapped (42 of the 465 intersected with 1734 genes in the original PAM50 study). The Luminal clusters exhibited a more diverse clustering pattern; however, the lincRNA and combined analyses were capable of delineating Luminal A from Luminal B and into several subclusters. These subclusters, interestingly, differed in overall survival, particularly amongst the Luminal A/B mixed subgroups in the lincRNA (about a 16% 5-year OS delta) and combined (10% 5-year OS delta) analyses. Hierarchical clustering relying on long-range expression regions at 1 Megabase resolution produces clusters that are enriched with well-known clinically relevant subtypes. A surprising finding is the capability for this method to reveal existing PAM50 subtypes across non-coding, intergenic regions. Of special interest is the demarcation of Luminals into different survival profiles using this method. To date, this is the first study to our knowledge that attempts to analyze and reveal existing and novel breast cancer subtypes across large regions of the genome and in long intergenic non-coding regions. Citation Format: Mankovich AR, Dimitrova N. Long-range expression analysis reveals new luminal subgroups associated with different patient outcomes. [abstract]. In: Proceedings of the Thirty-Eighth Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2015 Dec 8-12; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2016;76(4 Suppl):Abstract nr P6-05-05.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.