Abstract

BackgroundWhile global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups.MethodsWe ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets.ResultsUsing a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation.ConclusionIt is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results.

Highlights

  • While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate

  • Marked differences in the ability to segregate good and poor prognosis tumors were observed between the data sets using randomly generated gene lists of various sizes

  • When we examined the distribution of the hazard ratio estimates, which show the magnitude of the differences in rates of metastases of the tumors in the cluster pairs, the hazard ratio estimates tended to be larger in the TRANSBIG data set

Read more

Summary

Introduction

While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. A large number of global gene expression data sets of human breast cancers have become publicly available [1,2,3,4,5,6]. These data sets have provided a wealth of information for the generation and testing of biological and clinical hypotheses [7]. Clinical and pathological factors with relevance to breast cancer are extensively characterized, and the prognostic significance of these factors is reflected in these publicly available data sets These factors include tumor grade, Her and estrogen receptor (ER) expression [8]. The consistent prognostic efficacy of a proliferation signature is well established [13,14]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.