Abstract

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands.

Highlights

  • Since the publication of the first draft of the human genome [1, 2], there has been an explosion in genomic data

  • Evolution occurs through different mechanisms, including point mutations, gene duplication, horizontal gene transfer, and recombinations

  • We present a framework, based on the mathematical tools of computational topology, that can explicitly accommodate both recombination and mutation events across the evolutionary history of a sample of genomic sequences

Read more

Summary

Introduction

Since the publication of the first draft of the human genome [1, 2], there has been an explosion in genomic data. In the last few years, the richer and more comprehensive genomic characterization of many organisms have underscored the need of representations that are not strictly tree-like. Phenomena such as horizontal gene transfer in bacteria [6], the ability of viruses to borrow and lend genes across species, and hybridization in metazoa (in plants, in particular [7, 8]) are exposing some of the limitations imposed by tree-like phylogenetic structures. Of particular interest are representations that reduce to trees when evolution is tree-like; that capture genetic relations between ancestors, and identify genomic regions originating from different ancestral lineages; and, more generally, that allow for an interpretation of the observed data in terms of a chronological sequence of events

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.