Abstract

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation ({mathbb {S}}), gene duplication ({mathbb {D}}), gene loss ({mathbb {L}}), and horizontal gene transfer ({mathbb {T}}). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the {mathbb {D}}{mathbb {L}}{mathbb {T}}-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the {mathbb {D}}{mathbb {L}}{mathbb {T}}-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the {mathbb {D}}{mathbb {L}}{mathbb {T}}-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.

Highlights

  • A gene tree represents the evolution of a gene family, a group of genes assumed to descend from a single ancestral gene

  • The rooted caterpillar tree CTk can be defined as follows: CT1 is the tree reduced to a single leaf, while CTk (k > 1) is the tree formed by a left subtree equal to CTk−1 and a right subtree equal to CT1

  • Our work introduces the first results on counting and sampling evolutionary scenarios in models accounting for gene duplication, gene loss and horizontal gene transfer (HGT)

Read more

Summary

Introduction

A gene tree represents the evolution of a gene family, a group of genes assumed to descend from a single ancestral gene. The reconstruction of gene trees from molecular sequence data is a central but difficult problem in computational biology. It is common to observe an incongruence between gene trees and species trees (Maddison 1997; Degnan and Rosenberg 2006; Degnan et al 2012; Disanto and Rosenberg 2014; Disanto et al 2019). This discrepancy has motivated an intense research activity on the problem of reconstructing the gene tree of a gene family, conditional to a given species tree for the considered species. We refer to Szöllosi and Daubin (2012), Szöllosi et al (2015) for extensive reviews discussing how gene trees evolve within a species tree, describe existing models and methods for reconstructing gene trees within species trees

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call