Abstract

The transfer distance (TD) was introduced in the classification framework and studied in the context of phylogenetic tree matching. Recently, Lemoine et al. (Nature 556(7702):452–456, 2018. https://doi.org/10.1038/s41586-018-0043-0) showed that TD can be a powerful tool to assess the branch support on large phylogenies, thus providing a relevant alternative to Felsenstein’s bootstrap. This distance allows a reference branchbeta in a reference tree {mathcal {T}} to be compared to a branch b from another tree T (typically a bootstrap tree), both on the same set of n taxa. The TD between these branches is the number of taxa that must be transferred from one side of b to the other in order to obtain beta . By taking the minimum TD from beta to all branches in T we define the transfer index, denoted by phi (beta ,T), measuring the degree of agreement of T with beta . Let us consider a reference branch beta having p tips on its light side and define the transfer support (TS) as 1 - phi (beta ,T)/(p-1). Lemoine et al. (2018) used computer simulations to show that the TS defined in this manner is close to 0 for random “bootstrap” trees. In this paper, we demonstrate that result mathematically: when T is randomly drawn, TS converges in probability to 0 when n tends to infty . Moreover, we fully characterize the distribution of phi (beta ,T) on caterpillar trees, indicating that the convergence is fast, and that even when n is small, moderate levels of branch support cannot appear by chance.

Highlights

  • The transfer distance or R-distance was introduced in the classification framework by Day (1981) and Régnier (1965), as a measure ofsimilarity between partitions of a set

  • We explored the behavior of transfer bootstrap expectation (TBE) as a measure of support for the branches of a phylogenetic tree, compared to that of Felsenstein’s support (FS)

  • These results demonstrate that the normalisation by p − 1 proposed by Lemoine et al (2018) is fully justified for large n and irrespective of the shape of the inferred tree: in the absence of phylogenetic signal, TBE is close to 0

Read more

Summary

Introduction

The transfer distance or R-distance was introduced in the classification framework by Day (1981) and Régnier (1965), as a measure of (dis)similarity between partitions of a set. The main motivation for the present work is to study the properties of the transfer index and support, and characterize their asymptotic behavior when the reference branch is compared to a tree T drawn randomly according to some null model, reflecting in a bootstrap context the absence of phylogenetic signal in the analyzed data set. When T is a caterpillar tree, we fully characterize the probability distribution of the transfer index based on a one-to-one correspondence between these trees and North-East (NE) lattice paths, a common technique for counting combinatorial objects (Mohanty 1979) All of these results show that p − 1 is the appropriate normalization constant for the TS and TBE, as proposed by Lemoine et al (2018).

Preliminaries
Comparing the transfer index to the parsimony score
Asymptotic results for fixed p
Behavior of the transfer distance when p grows with n
3: Since p n
Exact distribution of the transfer index on caterpillar trees
Correspondence between bicolored caterpillar trees and NE lattice paths
Counting bicolorations through lattice paths: the transfer index distribution
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.