Abstract

We combined the category (CAT) mixture model (Lartillot N, Philippe H. 2004) and the nonstationary break point (BP) model (Blanquart S, Lartillot N. 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, thus accommodating site-specific selective constraints induced by protein structure and function. Furthermore, as in BP, these processes are nonstationary, and their equilibrium frequencies are allowed to change along lineages in a correlated way, through discrete shifts in global amino acid composition distributed along the phylogenetic tree. We implemented the CAT-BP model in a Bayesian Markov Chain Monte Carlo framework and compared its predictions with those of 3 simpler models, BP, CAT, and the site- and time-homogeneous general time-reversible (GTR) model, on a concatenation of 4 mitochondrial proteins of 20 arthropod species. In contrast to GTR, BP, and CAT, which all display a phylogenetic reconstruction artifact positioning the bees Apis mellifera and Melipona bicolor among chelicerates, the CAT-BP model is able to recover the monophyly of insects. Using posterior predictive tests, we further show that the CAT-BP combination yields better anticipations of site- and taxon-specific amino acid frequencies and that it better accounts for the homoplasies that are responsible for the artifact. Altogether, our results show that the joint modeling of heterogeneities across sites and along time results in a synergistic improvement of the phylogenetic inference, indicating that it is essential to disentangle the combined effects of both sources of heterogeneity, in order to overcome systematic errors in protein phylogenetic analyses.

Highlights

  • IntroductionThe ‘‘pruning’’ algorithm (Felsenstein 1981) was originally devised for data likelihood computation under the socalled F81 Markovian substitution process

  • The ‘‘pruning’’ algorithm (Felsenstein 1981) was originally devised for data likelihood computation under the socalled F81 Markovian substitution process. It opened the way for probabilistic approaches in phylogenetics, first using maximum likelihood (ML), and subsequently using Bayesian analysis based on Markov Chain Monte Carlo (MCMC) sampling

  • Similar results were later obtained by Delsuc et al (2005) on the same 4 mitochondrial genes, with a reduced taxon resampling of 20 arthropod species. Investigating this second data set, Delsuc et al (2005) showed that the ML phylogenetic reconstruction clusters 4 chelicerates among insects, on the branch leading to hymenopterans A. mellifera and M. bicolor, while the ML analysis of the RY-coded data set succeeds in recovering both insect and chelicerate monophyly

Read more

Summary

Introduction

The ‘‘pruning’’ algorithm (Felsenstein 1981) was originally devised for data likelihood computation under the socalled F81 Markovian substitution process It opened the way for probabilistic approaches in phylogenetics, first using maximum likelihood (ML), and subsequently using Bayesian analysis based on Markov Chain Monte Carlo (MCMC) sampling. Strong assumptions were made concerning 1) the constancy of the overall rate of substitution across sites as well as along lineages, 2) the independence between positions along the sequence, and 3) the use of a single Markovian substitution process applied along all lineages as well as over all sites Following this simplified but seminal version, many models relaxing those assumptions have been proposed

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.