Computing the Rooted Triplet Distance Between Phylogenetic Networks

Jesper Jansson,Konstantinos Mampentzidis,Wing-Kin Sung,Ramesh Rajaby

doi:10.1007/s00453-021-00802-1

Abstract

The rooted triplet distance measures the structural dissimilarity of two phylogenetic trees or phylogenetic networks by counting the number of rooted phylogenetic trees with exactly three leaf labels (called rooted triplets, or triplets for short) that occur as embedded subtrees in one, but not both, of them. Suppose that N_1 = (V_1, E_1) and N_2 = (V_2, E_2) are phylogenetic networks over a common leaf label set of size n, that N_i has level k_i and maximum in-degree d_i for i in {1,2}, and that the networks’ out-degrees are unbounded. Write N = max (|V_1|, |V_2|), M = max (|E_1|, |E_2|), k = max (k_1, k_2), and d = max (d_1, d_2). Previous work has shown how to compute the rooted triplet distance between N_1 and N_2 in mathrm {O}(n log n) time in the special case k le 1. For k > 1, no efficient algorithms are known; applying a classic method from 1980 by Fortune et al. in a direct way leads to a running time of {Omega }(N^{6} n^{3}) and the only existing non-trivial algorithm imposes restrictions on the networks’ in- and out-degrees (in particular, it does not work when non-binary vertices are allowed). In this article, we develop two new algorithms with no such restrictions. Their running times are mathrm {O}(N^{2} M + n^{3}) and mathrm {O}(M + N k^{2} d^{2} + n^{3}), respectively. We also provide implementations of our algorithms, evaluate their performance on simulated and real datasets, and make some observations on the limitations of the current definition of the rooted triplet distance in practice. Our prototype implementations have been packaged into the first publicly available software for computing the rooted triplet distance between unrestricted networks of arbitrary levels.

Full Text