Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis.

Basel Abu-Jamous,Asoke K Nandi,Rui Fa,David J Roberts

doi:10.1186/1471-2105-15-322

Abstract

BackgroundThe scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result.ResultsIn this paper we have performed comprehensive analysis of forty yeast microarray datasets. One recently described Bi-CoPaM method can analyse expressions of the same set of genes from various microarray datasets while using different clustering methods, and then combine these results into a single consensus result whose clusters’ tightness is tunable from tight, specific clusters to wide, overlapping clusters. This has been adopted in a novel way over genome-wide data from forty yeast microarray datasets to discover two clusters of genes that are consistently co-expressed over all of these datasets from different biological contexts and various experimental conditions. Most strikingly, average expression profiles of those clusters are consistently negatively correlated in all of the forty datasets while neither profile leads or lags the other.ConclusionsThe first cluster is enriched with ribosomal biogenesis genes. The biological processes of most of the genes in the second cluster are either unknown or apparently unrelated although they show high connectivity in protein-protein and genetic interaction networks. Therefore, it is possible that this mostly uncharacterised cluster and the ribosomal biogenesis cluster are transcriptionally oppositely regulated by some common machinery. Moreover, we anticipate that the genes included in this previously unknown cluster participate in generic, in contrast to specific, stress response processes. These novel findings illuminate coordinated gene expression in yeast and suggest several hypotheses for future experimental functional work. Additionally, we have demonstrated the usefulness of the Bi-CoPaM-based approach, which may be helpful for the analysis of other groups of (microarray) datasets from other species and systems for the exploration of global genetic co-expression.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-322) contains supplementary material, which is available to authorized users.

Highlights

The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations
To amplify the variation in cluster assignment caused by the differences in microarray datasets over the one caused by the differences amongst clustering methods, the partitions generated by applying different clustering methods over any single microarray dataset are first combined into a single intermediate fuzzy consensus partition matrix (CoPaM) whose membership values are processed by pushing them towards the binary values of zero and one (Figure 1); this is mathematically formulated as u i;j
Our results, based on a Bi-CoPaM-analysis of forty different and recent yeast microarray datasets each measuring the genetic expression of the yeast genome (~6000 genes) over multiple time-points or conditions, illustrate that the two most consistently co-expressed subsets of S. cerevisiae genes are the ribosomal biogenesis regulon (RRB) and a subset of genes which is in antiphase with ribosome biogenesis (APha-RiB)

Summary

Introduction

The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result. Advances in microarray technology have enabled measurements of expression of a vast number of genes simultaneously. Most microarray experiments consider measuring the expression values of the entire genome of a specific organism over multiple time-points, several biological developmental stages, different types of tissues, or different conditions [1]. Many different methods of microarray analysis have been designed and applied in order to address such diverse questions. Various supervised and unsupervised methods have been designed to answer questions related to the co-expression of genes [8,9,10,11,12]

Methods

Results

Discussion

Conclusion