Abstract

Combining a set of trees on partial datasets into a single tree is a classical method for inferring large phylogenetic trees. Ideally, the combined tree should display each input partial tree, which is only possible if input trees do not contain contradictory phylogenetic information. The simplest version of the supertree problem is thus to state whether a set of trees is compatible, and if so, construct a tree displaying them all. Classically, supertree methods have been applied to the reconstruction of species trees. Here we rather consider reconstructing a super gene tree in light of a known species tree S. We define the supergenetree problem as finding, among all supertrees displaying a set of input gene trees, one supertree minimizing a reconciliation distance with S. We first show how classical exact methods to the supertree problem can be extended to the supergenetree problem. As all these methods are highly exponential, we also exhibit a natural greedy heuristic for the duplication cost, based on minimizing the set of duplications preceding the first speciation event. We then show that both the supergenetree problem and its restriction to minimizing duplications preceding the first speciation are NP-hard to approximate within a n1-ϵ factor, for any 0 < ϵ < 1. Finally, we show that a restriction of this problem to uniquely labeled speciation gene trees, which is relevant to many biological applications, is also NP-hard. Therefore, we introduce new avenues in the field of supertrees, and set the theoretical basis for the exploration of various algorithmic aspects of the problems.

Highlights

  • A fundamental task in evolutionary biology is to combine a collection of rooted trees on partial, possibly overlapping, sets of data, into a single rooted tree on the full set of data

  • We show that the supergenetree problem for the duplication cost, and even its restricted version the Minimum pre-Speciation Duplication problem, are NPhard to approximate within a n1- factor, for any 0 < < 1 (n being the number of genes)

  • We introduce the supergenetree problem which aims at constructing a supertree that displays a set of input gene trees while minimizing the reconciliation cost with respect to an input species tree

Read more

Summary

Introduction

A fundamental task in evolutionary biology is to combine a collection of rooted trees on partial, possibly overlapping, sets of data, into a single rooted tree on the full set of data. A greedy heuristic for the duplication cost Instead of trying all partitions of the triplet graph components set at each step of the AllTrees or AllMinTrees algorithms, if the goal is to minimize the duplication cost, a natural greedy approach would be to choose the best partition at each iteration, namely the one allowing to minimize the number of duplications preceding each speciation event Such an approach would result in pushing duplications down the tree. We show that H is k-colorable if and only if T is compatible with some gene tree T having at most k − 1 pre-speciation duplications when reconciled with S In both cases, the gene-species mapping s is bijective, proving the second part of the theorem statement. The 3-colorability problem is NP-hard on 4-regular graph though, showing the NP-hardness of the problem on input trees having at most 4 leaves

Conclusion
27. Semple C
33. Zuckerman David
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call