Abstract

Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only extreme heterogeneity affects phylogenetic accuracy and suggest that violations of other model assumptions, such as variable rates among sites, are more problematic. In order to explore the interaction between compositional heterogeneity and variable rates among sites, I reanalyzed 3 real heterogeneous datasets using several models. My Bayesian inference recovers accurate topologies under variable rates-among-sites models, but fails under some models that account for compositional heterogeneity. I also ran simulations and found that accounting for rates among sites improves topology accuracy in compositionally heterogeneous data. This indicates that in some cases, models accounting for among-site rate variation can improve outcomes for data that violates the assumption of compositional homogeneity.

Highlights

  • Recent phylogenetic studies have explored the effect of compositional heterogeneity on phylogenetic methods

  • Compositional heterogeneity can arise in a dataset as a result of nonstationary evolution

  • If two nonsister subtrees have similar substitution bias, this can lead to a convergence in nucleotide composition (CNC). e taxa may look similar due to convergent evolution rather than common ancestry, which can mislead phylogenetic analysis. ere are several methods to detect and quantify the level of compositional heterogeneity in a dataset, including chisquared tests (e.g., [1]), Disparity Index [2], and relativerates tests [3]

Read more

Summary

Introduction

Recent phylogenetic studies have explored the effect of compositional heterogeneity on phylogenetic methods. Some models allow sites to be assigned to an invariable category (“+I”), which works in roughly the same way but xes the substitution rate at 0, instead of varying according to a gamma parameter Several studies, including those mentioned above, have suggested that violations of the assumption of constant rates among sites are more problematic than that of compositional homogeneity [7,8,9,10]. Ese results illustrate that, despite the base compositional bias, the Bayesian GTR+I+G model succeeded Does this model succeed because it accounts for among-site rate variation? Ey ascribed failure of many phylogenetic methods to the confounding signal of the convergent nucleotide composition, but the rst 2 reports did not thoroughly explore models that account for among-site rate variation. My results show that when accounting for among-site rate heterogeneity, Bayesian inference helps in each study; in each case, I nd the GTR+I+G, GTR+I and GTR+G models to outperform the GTR model alone

Methods
Results and Discussion
Caluromys lanatus
Panulirus
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.