Abstract

We analyze a maximum likelihood approach for combining phylogenetic trees into a larger "supertree." This is based on a simple exponential model of phylogenetic error, which ensures that ML supertrees have a simple combinatorial description (as a median tree, minimizing a weighted sum of distances to the input trees). We show that this approach to ML supertree reconstruction is statistically consistent (it converges on the true species supertree as more input trees are combined), in contrast to the widely used MRP method, which we show can be statistically inconsistent under the exponential error model. We also show that this statistical consistency extends to an ML approach for constructing species supertrees from gene trees. In this setting, incomplete lineage sorting (due to coalescence rates of homologous genes being lower than speciation rates) has been shown to lead to gene trees that are frequently different from species trees, and this can confound efforts to reconstruct the species phylogeny correctly.

Highlights

  • Combining trees on different, overlapping sets of taxa into a parent ‘supertree’ is a mainstream strategy for constructing large phylogenetic trees

  • As some maintain, summarising the phylogenetic information contained in a group of subtrees? Or are we trying to derive the best estimate of phylogeny given the information at hand? Nor is it clear which of these two conceptually different objectives underpin the various supertree reconstruction methods

  • Simple majority-rule approaches have recently been shown to be misleading, we show that an ML supertree approach for combining gene trees is statistically consistent

Read more

Summary

Introduction

Combining trees on different, overlapping sets of taxa into a parent ‘supertree’ is a mainstream strategy for constructing large phylogenetic trees. Phylogenetic supertree, maximum likelihood, gene tree, species tree, statistical consistency. We analyse one approach to obtain maximum-likelihood (ML) estimates of supertrees, based on a probability model that permits ‘errors’ in subtree topologies. A special case of the supertree problem arises when the taxon sets of the input trees are all the same (X1 = X2 = · · · = Xk). In an early paper McMorris (1990) described how, in this consensus setting, the majority rule consensus tree can be given a maximum likelihood interpretation. This approach is quite different to the one described here (even when restricted to the consensus problem).

An exponential model of phylogenetic error
Statistical Consistency of ML supertrees under the exponential model
Relation to MRP and its statistical inconsistency
Technical remarks
Statistical consistency of ML species supertrees from multiple gene trees
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call