Abstract

BackgroundTraditional clustering approaches for gene expression data are not well adapted to address the complexity and heterogeneity of tumors, where small sets of genes may be aberrantly co-expressed in specific subsets of tumors. Biclustering algorithms that perform local clustering on subsets of genes and conditions help address this problem. We propose a graph-based Tunable Biclustering Algorithm (TuBA) based on a novel pairwise proximity measure, examining the relationship of samples at the extremes of genes’ expression profiles to identify similarly altered signatures.ResultsTuBA's predictions are consistent in 3,940 breast invasive carcinoma samples from 3 independent sources, using different technologies for measuring gene expression (RNA sequencing and Microarray). More than 60% of biclusters identified independently in each dataset had significant agreement in their gene sets, as well as similar clinical implications. Approximately 50% of biclusters were enriched in the estrogen receptor−negative/HER2-negative (or basal-like) subtype, while >50% were associated with transcriptionally active copy number changes. Biclusters representing gene co-expression patterns in stromal tissue were also identified in tumor specimens.ConclusionsTuBA offers a simple biclustering method that can identify biologically relevant gene co-expression signatures not captured by traditional unsupervised clustering approaches. It complements biclustering approaches that are designed to identify constant or coherent submatrices in gene expression datasets, and outperforms them in identifying a multitude of altered transcriptional profiles that are associated with observed genomic heterogeneity of diseased states in breast cancer, both within and across tumor subtypes, a promising step in understanding disease heterogeneity, and a necessary first step in individualized therapy.

Highlights

  • Traditional clustering approaches for gene expression data are not well adapted to address the complexity and heterogeneity of tumors, where small sets of genes may be aberrantly co-expressed in specific subsets of tumors

  • 39 Conclusion: Tunable Biclustering Algorithm (TuBA) offers a simple biclustering method that can identify biologically relevant gene co-expression signatures not captured by traditional unsupervised clustering approaches

  • It complements biclustering approaches that are designed to identify constant or coherent submatrices in gene expression datasets, and outperforms them in identifying a multitude of altered transcriptional profiles that are associated with observed genomic heterogeneity of diseased states in breast cancer, both within and across tumor subtypes, a promising step in understanding disease heterogeneity, and a necessary first step in individualized therapy

Read more

Summary

Introduction

Measures of similarity (such as the Pearson correlation coefficient, Spearman correlation coefficient, Mutual Information, etc.) are employed to quantify the level of similarity between every pair of genes (or samples) across all samples (or genes) Such an approach is known as global clustering. In case of datasets with a heterogeneous assortment of samples, only a small subset of genes in a fraction of the total set of samples may be co-regulated in specific cellular processes. This is especially true for diseases like cancer that manifest a plethora of diseased phenotypes. Global clustering is not an optimal approach to identify co-expressed sets of genes or samples in gene expression datasets associated with heterogeneous diseases

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.