Abstract

Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.

Highlights

  • Transcriptome sequencing, or RNA-seq, has shown huge potential for understanding the genetic and genomic bases of diversification in non-model systems

  • Methodological issues in homology and orthology inference, especially in accommodating the frequent genome duplications in plants, have resulted in the discarding of a large proportion of genes from previous phylogenomic studies (Yang and Smith 2014). These limitations, together with the dynamic nature of gene expression, gene duplication and loss, lineage specific heterogeneity in substitution rates and gene tree topology discordance have resulted in sparse matrices among nuclear genes in prior analyses, leading researchers to reduce data sets to a small number of genes for analysis. These challenges have limited many RNA-seq phylogenomic studies to inferring a species tree and only a limited number of studies have explored transcriptome-wide functional analyses beyond one-to-one orthologs or genes involved in a particular functional category (Barker et al 2008; Lee et al 2011)

  • Exploring functional categories of genes Aside from the phylogenetic locations of gene duplications, we investigated functional categories of genes that showed high taxon occupancy or lineage-specific duplications

Read more

Summary

Introduction

Transcriptome sequencing, or RNA-seq, has shown huge potential for understanding the genetic and genomic bases of diversification in non-model systems (for example, Barker et al 2008; Dunn et al 2008; Lee et al 2011; Wickett et al 2011; Delaux et al 2014; Li et al 2014; Misof et al 2014; Sveinsson et al 2014; Wickett et al 2014; Cannon et al 2015; Hollister et al 2015). Methodological issues in homology and orthology inference, especially in accommodating the frequent genome duplications in plants, have resulted in the discarding of a large proportion of genes from previous phylogenomic studies (Yang and Smith 2014) These limitations, together with the dynamic nature of gene expression, gene duplication and loss, lineage specific heterogeneity in substitution rates and gene tree topology discordance have resulted in sparse matrices among nuclear genes in prior analyses, leading researchers to reduce data sets to a small number of genes for analysis. The extraordinary diversity in growth forms and ecological adaptations makes Caryophyllales an ideal group for investigating gene and genome evolution and heterogeneity in molecular substitution rate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call