Abstract

BackgroundSpecies phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another) due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods.ResultsWe report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions), substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct.ConclusionsOur study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.

Highlights

  • Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets

  • The most computationally intensive methods we explored use MrBayes or RAxML to estimate distributions on gene trees, but MrBayes analyses are expensive

  • Overall summary of performance The experiments we reported have datasets that range in terms of the causes for incongruence between estimated gene trees, rates of evolution, presence of indels, number of taxa, and type of alignment

Read more

Summary

Introduction

Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. The most frequently used approaches for estimating species phylogenies compute alignments on each gene, concatenate these alignments into one super-alignment, and estimate a tree from the super-alignment. These “combined analysis” methods do not have good statistical properties because different regions of the genome can have different evolutionary histories. Several methods have been developed that estimate species trees from estimated gene trees under the ILS model. Standard consensus tree methods have been shown to not be statistically consistent [10] (see [11,12])

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call