Abstract

BackgroundBreast cancer is a heterogeneous disease comprising several biologically different types, exhibiting diverse responses to treatment. In the past years, gene expression profiling has led to definition of several “intrinsic subtypes” of breast cancer (basal-like, HER2-enriched, luminal-A, luminal-B and normal-like), and microarray based predictors such as PAM50 have been developed. Despite their advantage over traditional histopathological classification, precise identification of breast cancer subtypes, especially within the largest and highly variable luminal-A class, remains a challenge. In this study, we revisited the molecular classification of breast tumors using both expression and methylation data obtained from The Cancer Genome Atlas (TCGA).MethodsUnsupervised clustering was applied on 1148 and 679 breast cancer samples using RNA-Seq and DNA methylation data, respectively. Clusters were evaluated using clinical information and by comparison to PAM50 subtypes. Differentially expressed genes and differentially methylated CpGs were tested for enrichment using various annotation sets. Survival analysis was conducted on the identified clusters using the log-rank test and Cox proportional hazards model.ResultsThe clusters in both expression and methylation datasets had only moderate agreement with PAM50 calls, while our partitioning of the luminal samples had better five-year prognostic value than the luminal-A/luminal-B assignment as called by PAM50. Our analysis partitioned the expression profiles of the luminal-A samples into two biologically distinct subgroups exhibiting differential expression of immune-related genes, with one subgroup carrying significantly higher risk for five-year recurrence. Analysis of the luminal-A samples using methylation data identified a cluster of patients with poorer survival, characterized by distinct hyper-methylation of developmental genes. Cox multivariate survival analysis confirmed the prognostic significance of the two partitions after adjustment for commonly used factors such as age and pathological stage.ConclusionsModern genomic datasets reveal large heterogeneity among luminal breast tumors. Our analysis of these data provides two prognostic gene sets that dissect and explain tumor variability within the luminal-A subgroup, thus, contributing to the advancement of subtype-specific diagnosis and treatment.Electronic supplementary materialThe online version of this article (doi:10.1186/s13058-016-0724-2) contains supplementary material, which is available to authorized users.

Highlights

  • Breast cancer is a heterogeneous disease comprising several biologically different types, exhibiting diverse responses to treatment

  • Separation of luminal-A and luminal-B samples is not reconstructed by RNA-Seq unsupervised analysis We started by evaluating the global sample structure within the RNA-Seq gene expression data obtained from The Cancer Genome Atlas (TCGA)

  • This study emphasizes the large heterogeneity of luminal breast tumors in general, and of luminal-A samples in particular, the inner variability of which was found to be inadequately captured by PAM50 molecular subtypes

Read more

Summary

Introduction

Breast cancer is a heterogeneous disease comprising several biologically different types, exhibiting diverse responses to treatment. Gene expression profiling has led to definition of several “intrinsic subtypes” of breast cancer (basal-like, HER2-enriched, luminal-A, luminal-B and normal-like), and microarray based predictors such as PAM50 have been developed. Despite their advantage over traditional histopathological classification, precise identification of breast cancer subtypes, especially within the largest and highly variable luminal-A class, remains a challenge. With the emergence of global molecular profiling techniques, large genomic datasets became available for subtype discovery using unsupervised algorithms By this methodology, breast samples are partitioned into subgroups using clustering algorithms, such as hierarchical clustering [3] or K-Means, and subgroup significance is evaluated using the clinical data associated with the samples

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call