Abstract

Duplication-Transfer-Loss (DTL) reconciliation is a widely used computational technique for understanding gene family evolution and inferring horizontal gene transfer (transfer for short) in microbes. However, most existing models and implementations of DTL reconciliation cannot account for the effect of unsampled or extinct species lineages on the evolution of gene families, likely affecting their accuracy. Accounting for the presence and possible impact of any unsampled species lineages, including those that are extinct, is especially important for inferring and studying horizontal transfer since many genes in the species lineages represented in the reconciliation analysis are likely to have been acquired through horizontal transfer from unsampled lineages. While models of DTL reconciliation that account for transfer from unsampled lineages have already been proposed, they use a relatively simple framework for transfer from unsampled lineages and cannot explicitly infer the location on the species tree of each unsampled or extinct lineage associated with an identified transfer event. Furthermore, there does not yet exist any systematic studies to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation. In this work, we address these deficiencies by (i) introducing an extended DTL reconciliation model, called the DTLx reconciliation model, that accounts for unsampled and extinct species lineages in a new, more functional manner compared to existing models, (ii) showing that optimal reconciliations under the new DTLx reconciliation model can be computed just as efficiently as under the fastest DTL reconciliation model, (iii) providing an efficient algorithm for sampling optimal DTLx reconciliations uniformly at random, (iv) performing the first systematic simulation study to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation, and (v) comparing the accuracies of inferring transfers from unsampled lineages under our new model and the only other previously proposed parsimony-based model for this problem.

Highlights

  • Introduction iationsUnderstanding how genes and species evolve is fundamental to our understanding of biology

  • Given a nonempty subset L ⊆ Le(T), we denote by lcaT (L) the least common ancestor (LCA) of all the leaves L in tree T; that is, lcaT (L) is the unique smallest upper bound of L under ≤T

  • We labelled transfers from unsampled lineages as ground truth TX events if, in the event log generated by ZOMBI, the gene survives in an extant lineage but the original copy goes extinct

Read more

Summary

Introduction

Introduction iationsUnderstanding how genes and species evolve is fundamental to our understanding of biology. Gene families evolve through complex evolutionary processes such as gene duplication, horizontal gene transfer (or transfer for short), homologous recombination, gene loss, and speciation. Duplication-Transfer-Loss (DTL) reconciliation is one of the most powerful computational techniques for studying microbial gene family evolution and for inferring evolutionary events such as transfer and gene duplication. Several different DTL reconciliation models and algorithms have been developed. Algorithms for DTL reconciliation take as input a gene tree (i.e., evolutionary tree for a gene family) and a species tree (i.e., evolutionary tree for the corresponding collection of species) and reconcile any topological differences between the Licensee MDPI, Basel, Switzerland. DTL reconciliation has been rigorously studied over the last several years [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call