Abstract

Simple SummaryGene expression profiling of tumors is an essential approach for the selection of biomarkers and the investigation of the molecular mechanisms of cancer, but transcriptomic results are often difficult to reproduce due to technical biases, sample heterogeneity, or small sample sizes. Combining many datasets can help to reduce artefacts and improve statistical power. Therefore, we aimed at creating a comprehensive resource of transcriptomic datasets investigating breast cancers, focusing on microdissected tumors, which enable the distinguishing of the contribution of the tumor microenvironment from that of cancer cells. We define robust lists of differentially expressed genes and describe their relationships with clinical features in each cellular compartment, identifying clinically relevant markers that can only be retrieved by measuring their expression in the sole tumor microenvironment.Transcriptome data provide a valuable resource for the study of cancer molecular mechanisms, but technical biases, sample heterogeneity, and small sample sizes result in poorly reproducible lists of regulated genes. Additionally, the presence of multiple cellular components contributing to cancer development complicates the interpretation of bulk transcriptomic profiles. To address these issues, we collected 48 microarray datasets derived from laser capture microdissected stroma or epithelium in breast tumors and performed a meta-analysis identifying robust lists of differentially expressed genes. This was used to create a database with carefully harmonized metadata that we make freely available to the research community. As predicted, combining the results of multiple datasets improved statistical power. Moreover, the separate analysis of stroma and epithelium allowed the identification of genes with different contributions in each compartment, which would not be detected by bulk analysis due to their distinct regulation in the two compartments. Our method can be profitably used to help in the discovery of biomarkers and the identification of functionally relevant genes in both the stroma and the epithelium. This database was made to be readily accessible through a user-friendly web interface.

Highlights

  • IntroductionHigh-throughput analyses of gene expression hold great promise for the identification of biomarkers of clinical status, with the potential of predicting outcome, response to therapy, or informing researchers about molecular mechanisms underpinning disease onset and progression and identifying therapeutic targets [1]

  • We collected 48 transcriptomic datasets of breast tumors or breast hyperplasias deposited in the Gene Expression Omnibus (GEO) database, selecting experiments where different cellular compartments were separated prior to RNA extraction

  • Despite the difficulty of accurately detecting the signal deriving from specific cellular compartment in bulk, as discussed above, we showed in bulk samples that the epithelial and vascular signatures are independent predictors of a patient’s disease-free survival (DFS) (Figure 7)

Read more

Summary

Introduction

High-throughput analyses of gene expression hold great promise for the identification of biomarkers of clinical status, with the potential of predicting outcome, response to therapy, or informing researchers about molecular mechanisms underpinning disease onset and progression and identifying therapeutic targets [1]. Lists of candidate genes obtained through transcriptome-based studies have proven difficult to reproduce [2,3,4,5,6], raising a note of caution regarding conclusions driven by single sets of experiments. Sample collection and processing methods, protocols, and platforms may impact on the resulting gene signatures, making them non-overlapping between studies [7]. Additional variability may be introduced by patient heterogeneity, which is not sufficiently represented in small samples

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call