Abstract

Genome-wide gene expression analysis are routinely used to gain a systems-level understanding of complex processes, including network connectivity. Network connectivity tends to be built on a small subset of extremely high co-expression signals that are deemed significant, but this overlooks the vast majority of pairwise signals. Here, we developed a computational pipeline to assign to every gene its pair-wise genome-wide co-expression distribution to one of 8 template distributions shapes varying between unimodal, bimodal, skewed, or symmetrical, representing different proportions of positive and negative correlations. We then used a hypergeometric test to determine if specific genes (regulators versus non-regulators) and properties (differentially expressed or not) are associated with a particular distribution shape. We applied our methodology to five publicly available RNA sequencing (RNA-seq) datasets from four organisms in different physiological conditions and tissues. Our results suggest that genes can be assigned consistently to pre-defined distribution shapes, regarding the enrichment of differential expression and regulatory genes, in situations involving contrasting phenotypes, time-series, or physiological baseline data. There is indeed a striking additional biological signal present in the genome-wide distribution of co-expression values which would be overlooked by currently adopted approaches. Our method can be applied to extract further information from transcriptomic data and help uncover the molecular mechanisms involved in the regulation of complex biological process and phenotypes.

Highlights

  • Uncovering the genetic architecture behind complex phenotypes involves analyzing a large variety of genes that interact with each other to respond to environmental stimuli [1]

  • Our methodology aims to evaluate co-expression distributions at the individual gene-level, we did calculate all correlations across all genes for each of the five RNA-Seq datasets we evaluated: Cattle Feed Efficiency, Cattle Puberty, Drosophila Embryogenesis, Duck Preadipocyte, and Human

  • The number of positive correlations is especially elevated in the Cattle Feed Efficiency dataset

Read more

Summary

Introduction

Uncovering the genetic architecture behind complex phenotypes involves analyzing a large variety of genes that interact with each other to respond to environmental stimuli [1]. The implication is, within the network, different genes present different “behaviors” (i.e., different abilities to influence other molecules in the network), represented by the strength and number of their correlation coefficients. With this in mind, we propose that individual genes possess genome-wide distributions of co-expression values that may be different from those observed when examining all pairs. We propose that individual genes possess genome-wide distributions of co-expression values that may be different from those observed when examining all pairs These gene-specific distributions may respond to environmental condition and/or physiological state, producing different distributions in different biological circumstances

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call