Abstract

Abstract Background: Breast cancer subtyping using gene expression is well established in breast cancer research. While it is known that there are large chromosomal regions affected by copy number polymorphisms in breast cancer, it is not clear whether expression patterns reflect large genomic events affecting longer portion of the chromosome. We present a method to quantify long range gene expression patterns of larger genomic regions at 23kb, 100kb and 1mb. Methods: We used TCGA-level 2 breast cancer gene expression data (RNA-Seq) generated at the Carolina Center for Genome Sciences, UNC at Chapel Hill. We evaluated the long range expression patterns of 649 patients for which we had ER, PR and Her2 status data by IHC. For 221 samples we had ER, PR, Her2, age, menopausal status, p53 and PIK3CA mutation status. Our method defines long range expression within a window of a particular length (e.g. 23kb, 100kb). We take the mean expression scores for all genes that fall within each window and concatenate these windows to obtain larger chromosome-wide patterns. The final chromosome-wide vectors are concatenated to represent long range expression patterns across the entire genome. We then evaluate the variation of these window scores across all samples and keep top 2% varying windows. Then, we apply hierarchical clustering, and evaluate enrichment of clinically meaningful subtypes using hypergeometric test. Results: Simple hierarchical clustering revealed clear separation of triple negative breast cancer samples at any level of resolution: 23kb, 100kb in the available data set. While the 100kb resolution showed two distinct clusters, the 23kb resolution showed three distinct clusters. Interestingly, the hierarchical clustering of samples (n=649) using the top 214 (2%) highly varying long 100kb regions revealed a cluster which contained 99 samples and was enriched with 75 TNBC samples (out of 102 TNBC in the entire set) (p=1.7E-53). The long range expression of 23Kb regions on 221 samples revealed 319 (2%) such regions which further segregate the samples into three different clusters: Cluster1: Luminal enriched cluster with 147 samples out of which 104 are ER+PR+Her2- samples (p-value <<0.01), Cluster2: Her2+ enriched cluster with 31 samples out of which 23 are Her2+ (p-value<<0. 01), and Cluster3: TNBC enriched cluster containing 33 samples out of which 26 are TNBC (p=1.5E-18). There are a total number of 38 TNBC samples. Furthermore, in the TNBC cluster with 33 samples, 28 had p53 mutations and 24 of those were both TNBC and had p53 mutations (p-value << 0.05). In this TNBC cluster, there is enrichment of samples lacking PIK3CA mutations (n=31)(p-value <<0.01). We found no clear association with age and menopausal status. Conclusions: Hierarchical clustering relying on long range expression regions produces clusters that are enriched with well known clinically relevant subtypes. The associations with known tumor biology are strong for different region sizes (23kb and 100kb). This is the first study to report long range gene expression patterns that reveal data-driven close association with tumor biology. Citation Format: Alex Mankovich, Vartika Agrawal, Nilanjana Banerjee, Nevenka Dimitrova. Long range expression patterns detected by RNASeq in breast cancer reveal cluster enriched with triple negative breast cancer [abstract]. In: Proceedings of the Thirty-Seventh Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2014 Dec 9-13; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2015;75(9 Suppl):Abstract nr P2-03-14.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call