Abstract

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.

Highlights

  • Gene gain and loss is known as a major force driving evolution

  • Complexity of the Super‐Reconciliation problem We have recently considered the problem of finding a supertree of a set of gene trees minimizing the classical single gene duplication and single gene duplication and loss distances

  • Simulations have been performed according to five parameters: t, the number of gene families in the ancestral synteny; d, the maximum depth of the balanced tree; pdupl, the probability for any given node to be a segmental duplication; ploss, the probability for a loss to occur under any given node; and plength, the probability to remove one gene in a segmental loss, It is clear that if the values of Clca(vi) and C∗(vi) are known for the children v1, v2 of v, Clca(v) and C∗(v) can be computed in constant time, assuming we have access to lcaSet (v) for every v ∈ T

Read more

Summary

Background

Gene gain and loss is known as a major force driving evolution. The classical method used for inferring these events is to reconstruct the tree of the gene family of interest and to embed it into the species phylogeny. Simulations have been performed according to five parameters: t, the number of gene families in the ancestral synteny; d, the maximum depth of the balanced tree; pdupl , the probability for any given node to be a segmental duplication; ploss , the probability for a loss to occur under any given node; and plength , the probability to remove one gene in a segmental loss, It is clear that if the values of Clca(vi) and C∗(vi) are known for the children v1, v2 of v, Clca(v) and C∗(v) can be computed in constant time, assuming we have access to lcaSet (v) for every v ∈ T. As for time-efficiency, values for inferring the SuperReconciliation of a single tree, aggregated over 500 simulations per value of t, the size of the ancestral synteny

Best-fit 4th-degree polynomial
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call