Alignment- and reference-free phylogenomics with colored de Bruijn graphs

Roland Wittler

doi:10.1186/s13015-020-00164-3

Abstract

BackgroundThe increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies—a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time.ResultsWe present a new whole-genome based approach to infer phylogenies that is alignment- and reference-free. In contrast to other methods, it does not rely on pairwise comparisons to determine distances to infer edges in a tree. Instead, a colored de Bruijn graph is constructed, and information on common subsequences is extracted to infer phylogenetic splits.ConclusionsThe introduced new methodology for large-scale phylogenomics shows high potential. Application to different datasets confirms robustness of the approach. A comparison to other state-of-the-art whole-genome based methods indicates comparable or higher accuracy and efficiency.

Highlights

A common task in comparative genomics is the reconstruction of the evolutionary relationships of species or other taxonomic entities, their phylogeny
We proposed a new k-mer based method for phylogenetic inference that neither relies on alignments to a reference sequence nor on pairwise or multiple alignments to infer markers
Prevailing whole-genome approaches perform pairwise comparisons to determine a quadratic number of distances to infer a linear number of tree edges

Summary

Introduction

A common task in comparative genomics is the reconstruction of the evolutionary relationships of species or other taxonomic entities, their phylogeny. The sheer number of genomes to be analyzed is still posing limits in large-scale scenarios as almost all whole-genome approaches are based on a pairwise comparison of some characteristics of the genomes (e.g. occurrences or frequencies of k-mers or other patterns) to define distances which are used to reconstruct a tree (e.g. by using neighbor joining [7]). This means, for n genomes, O(n2) comparisons are performed in order to infer O(n) edges. If either A or B is of cardinality one, a split is called trivial

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Apr 7, 2020
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

Alignment- and reference-free phylogenomics with colored de Bruijn graphs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Rainbowfish: A Succinct Colored de Bruijn Graph Representation
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

Alignment- and Reference-Free Phylogenomics with Colored de Bruijn Graphs.

-

01 Jan 2019
01 Jan 2019

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT.
Andrea Cracco ... Alexandru I Tomescu
Genome research | VOL. 33
Andrea Cracco, et. al.Andrea Cracco ... Alexandru I Tomescu
30 May 2023
Genome research | VOL. 33

Building large updatable colored de Bruijn graphs via merging.
Martin D Muggli ... Christina Boucher
Bioinformatics (Oxford, England) | VOL. 35
Martin D Muggli, et. al.Martin D Muggli ... Christina Boucher
05 Jul 2019
Bioinformatics (Oxford, England) | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment- and reference-free phylogenomics with colored de Bruijn graphs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology