Phylogenetic Mixture Model Research Articles

Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and site classification; the steppingstone sampler (SS) is used to select amongst competing mixture models. Example applications of simulated data and mitochondrial DNA of primates illustrate site classification via ‘augmented’ Bayesian phylogenetic mixtures. In both examples, all mixtures outperform commonly-used models of among-site rate variation and models that do not account for rate heterogeneity. The examples further demonstrate how site classification is readily available from the analysis output. The method is directly relevant to the choice of partitions in Bayesian phylogenetics, and its application may lead to the discovery of structure not otherwise recognised in a molecular sequence alignment. Computational aspects of Bayesian phylogenetic model estimation are discussed, including the use of simple Markov chain Monte Carlo (MCMC) moves that mix efficiently without tempering the chains. The contribution to the field of Bayesian phylogenetics is in (1) the use of mixture models augmented with allocation variables as tools for site classification and quantification of classification uncertainty, (2) the successful application of SS for selection of phylogenetic mixtures, and (3) the development of novel MCMC aspects of relevance to Bayesian phylogenetic models—whether mixtures or not.11The MCMC methods discussed in this paper have been coded in a C program; source files are available upon request. Supplementary material is available online (see Appendix A).

Read full abstract

BackgroundPhylogenetic reconstruction methods based on gene content often place all the parasitic and endosymbiotic eubacteria (parasites for short) together in a clan. Many other lines of evidence point to this parasites clan being an artefact. This artefact could be a consequence of the methods used to construct ortholog databases (due to some unknown bias), the methods used to estimate the phylogeny, or both.We test the idea that the parasites clan is an ortholog identification artefact by analyzing three different ortholog databases (COG, TRIBES, and OFAM), which were constructed using different methods, and are thus unlikely to share the same biases. In each case, we estimate a phylogeny using an improved version of the conditioned logdet distance method. If the parasites clan appears in trees from all three databases, it is unlikely to be an ortholog identification artefact.Accelerated loss of a subset of gene families in parasites (a form of heterotachy) may contribute to the difficulty of estimating a phylogeny from gene content data. We test the idea that heterotachy is the underlying reason for the estimation of an artefactual parasites clan by applying two different mixture models (phylogenetic and non-phylogenetic), in combination with conditioned logdet. In these models, there are two categories of gene families, one of which has accelerated loss in parasites. Distances are estimated separately from each category by conditioned logdet. This should reduce the tendency for tree estimation methods to group the parasites together, if heterotachy is the underlying reason for estimation of the parasites clan.ResultsThe parasites clan appears in conditioned logdet trees estimated from all three databases. This makes it less likely to be an artefact of database construction. The non-phylogenetic mixture model gives trees without a parasites clan. However, the phylogenetic mixture model still results in a tree with a parasites clan. Thus, it is not entirely clear whether heterotachy is the underlying reason for the estimation of a parasites clan. Simulation studies suggest that the phylogenetic mixture model approach may be unsuccessful because the model of gene family gain and loss it uses does not adequately describe the real data.ConclusionsThe most successful methods for estimating a reliable phylogenetic tree for parasitic and endosymbiotic eubacteria from gene content data are still ad-hoc approaches such as the SHOT distance method. however, the improved conditioned logdet method we developed here may be useful for non-parasites and can be accessed at http://www.liv.ac.uk/~cgrbios/cond_logdet.html

Read full abstract

Phylogenetic Mixture Model Research Articles

Related Topics

Articles published on Phylogenetic Mixture Model

Dimensions of Group-Based Phylogenetic Mixtures.

Classification of molecular sequence data using Bayesian phylogenetic mixture models

When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?

The Disentangling Number for Phylogenetic Mixtures

A phylogenetic mixture model for the identification of functionally divergent protein residues

Identifiability of Large Phylogenetic Mixture Models

On the artefactual parasitic eubacteria clan in conditioned logdet phylogenies: heterotachy and ortholog identification artefacts as explanations

A Phylogenetic Mixture Model for Gene Family Loss in Parasitic Bacteria

A Phylogenetic Mixture Model for Gene Family Loss in Parasitic Bacteria

Networks, trees, and treeshrews: assessing support and identifying conflict with multiple loci and a problematic root.

Phylogenetic mixture models for proteins

Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo

Phylogenetic Mixture Models Can Reduce Node-Density Artifacts

Medusozoan Phylogeny and Character Evolution Clarified by New Large and Small Subunit rDNA Data and an Assessment of the Utility of Phylogenetic Mixture Models

A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phylogenetic Mixture Model Research Articles

Related Topics

Articles published on Phylogenetic Mixture Model

Dimensions of Group-Based Phylogenetic Mixtures.

Classification of molecular sequence data using Bayesian phylogenetic mixture models

When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?

The Disentangling Number for Phylogenetic Mixtures

A phylogenetic mixture model for the identification of functionally divergent protein residues

Identifiability of Large Phylogenetic Mixture Models

On the artefactual parasitic eubacteria clan in conditioned logdet phylogenies: heterotachy and ortholog identification artefacts as explanations

A Phylogenetic Mixture Model for Gene Family Loss in Parasitic Bacteria

A Phylogenetic Mixture Model for Gene Family Loss in Parasitic Bacteria

Networks, trees, and treeshrews: assessing support and identifying conflict with multiple loci and a problematic root.

Phylogenetic mixture models for proteins

Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo

Phylogenetic Mixture Models Can Reduce Node-Density Artifacts

Medusozoan Phylogeny and Character Evolution Clarified by New Large and Small Subunit rDNA Data and an Assessment of the Utility of Phylogenetic Mixture Models

A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data.