Abstract

Comparative studies of gene expression across species have revealed many important insights, but have also been limited by the number of species represented. Here we develop an approach to identify orthologs between highly diverged transcriptome assemblies, and apply this to 657 RNA-seq gene expression profiles from 309 diverse unicellular eukaryotes. We analyzed the resulting data for coevolutionary patterns, and identify several hundred protein complexes and pathways whose expression levels have evolved in a coordinated fashion across the trillions of generations separating these species, including many gene sets with little or no within-species co-expression across environmental or genetic perturbations. We also detect examples of adaptive evolution, for example of tRNA ligase levels to match genome-wide codon usage. In sum, we find that comparative studies from extremely diverse organisms can reveal new insights into the evolution of gene expression, including coordinated evolution of some of the most conserved protein complexes in eukaryotes.

Highlights

  • Comparative studies of gene expression across species have revealed many important insights, but have been limited by the number of species represented

  • We have made the full data set available in an interactive website at http:// mmetspdata.appspot.com

  • Many databases of orthologs exist, these cannot be applied to the de novo assembled transcriptomes of the Marine Microbial Eukaryotic Transcriptome Project (MMETSP)

Read more

Summary

Introduction

Comparative studies of gene expression across species have revealed many important insights, but have been limited by the number of species represented. Measured gene expression levels could potentially uncover genes with correlated evolution, including genes that are never lost and not amenable to PP; in practice this has not been possible because of the small number of species, and the narrow phylogenetic breadth, in previous studies of gene expression evolution The largest such studies have been limited to a few dozen species and have focused exclusively on mammals[13,14] or yeast[15], in contrast to recent PP studies that utilize hundreds of complete genome sequences from widely divergent species[6,16,17]. The Marine Microbial Eukaryotic Transcriptome Project (MMETSP)[18] recently generated what is by far the largest multispecies gene expression data set to date, both in terms of the number of species and the phylogenetic diversity, with RNA-seq for 657 samples from 309 species These species are eukaryotic marine microbes collected from across the world, spanning most major eukaryotic lineages, including many rarely studied phyla that lack even a single sequenced genome (Fig. 1a)[18]. We have made the full data set available in an interactive website at http:// mmetspdata.appspot.com

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call