Abstract
Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.
Highlights
Commonly-used models of sequence evolution, such as GTR [1], are time reversible and can be used to reconstruct unrooted phylogenetic trees
RQ3: What is the impact of rooting error on the species tree estimation, and is STAR less accurate than its unrooted counterpart, NJst?
We introduce a new method for rooting phylogenetic trees, which relies on minimizing the variance of the root to tip distances
Summary
Commonly-used models of sequence evolution, such as GTR [1], are time reversible and can be used to reconstruct unrooted phylogenetic trees. The correct placement of the root is often of intrinsic interest as evident by long debates on the correct rooting of the universal tree-of-life [2,3,4,5,6,7], and other major groups (e.g., [5, 8, 9]). The knowledge of the root is often needed for downstream applications of phylogenetic trees, such as ancestral state reconstruction [10], comparative genomics [11], taxonomic profiling of metagenomic samples [12, 13], and dating.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have