Abstract

Breast cancer is a highly heterogeneous disease that can be classified into multiple subtypes based on the tumor transcriptome. Most of the subtyping schemes used in clinics today are derived from analyses of microarray data from thousands of different tumors together with clinical data for the patients from which the tumors were isolated. However, RNA sequencing (RNA‐Seq) is gradually replacing microarrays as the preferred transcriptomics platform, and although transcript abundances measured by the two different technologies are largely compatible, subtyping methods developed for probe‐based microarray data are incompatible with RNA‐Seq as input data. Here, we present an RNA‐Seq data processing pipeline, which relies on the mapping of sequencing reads to the probe set target sequences instead of the human reference genome, thereby enabling probe‐based subtyping of breast cancer tumor tissue using sequencing‐based transcriptomics. By analyzing 66 breast cancer tumors for which gene expression was measured using both microarrays and RNA‐Seq, we show that RNA‐Seq data can be directly compared to microarray data using our pipeline. Additionally, we demonstrate that the established subtyping method CITBCMST (Guedj et al., 2012), which relies on a 375 probe set‐signature to classify samples into the six subtypes basL, lumA, lumB, lumC, mApo, and normL, can be applied without further modifications. This pipeline enables a seamless transition to sequencing‐based transcriptomics for future clinical purposes.

Highlights

  • Breast cancer is a highly heterogeneous disease with several clinical subtypes defined by transcriptomic expression profiles that correlate with pathogenesis, clinical features, and prognosis (Goldhirsch et al, 2013; Parker et al, 2009; Stratton et al, 2009)

  • We found that mapping RNA sequencing (RNA-Seq) reads to the array probe set sequences, followed by quantile normalization to a target distribution consisting of the mean probe intensities in the microarray training data, followed by batch correction with the microarray training data (Fig. 1) resulted in a very high correlation between microarray- and RNA-Seq-based abundances (R2 = 0.9445 and Spearman’s q = 0.9638) and highly similar subtyping compared with the subtyping on the corresponding microarray data

  • The RNA Breast Cancer (RNABC) pipeline resulted in 51 out of the 57 paired samples matching on predicted CITBCMST subtype, while the remaining six samples were mismatches

Read more

Summary

Introduction

Breast cancer is a highly heterogeneous disease with several clinical subtypes defined by transcriptomic expression profiles that correlate with pathogenesis, clinical features, and prognosis (Goldhirsch et al, 2013; Parker et al, 2009; Stratton et al, 2009). The use of DNA microarrays has been pivotal to cancer research for the past decades, but transcriptomics is moving toward RNA sequencing (RNA-Seq) as this technique allows for quantification of previously uncharacterized transcripts, as well as novel genetic aberrations such as fusion genes and alternative. Abbreviations CITBCMST, CIT breast cancer molecular subtypes; FPKM, fragments per kilobase per million; MAD, median absolute deviation; PCA, principal component analysis; PREBS, probe region expression estimation based on sequencing; QC, quality control; RMA, robust multi-array average; RNABC, RNA breast cancer; RNA-Seq, RNA sequencing; SCC, Spearman’s rank correlation coefficient; TDM, training distribution matching; TPM, transcripts per million.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.