Compression Algorithm for Colored de Bruijn Graphs.

Amatur Rahman,Yoann Dufresne,Paul Medvedev

doi:10.4230/lipics.wabi.2023.17

Abstract

A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compression Algorithm for Colored de Bruijn Graphs.

Abstract

Talk to us

Similar Papers

More From: LIPIcs : Leibniz international proceedings in informatics

Lead the way for us

Similar Papers

Compression algorithm for colored de Bruijn graphs
Amatur Rahman ... Paul Medvedev
Algorithms for Molecular Biology | VOL. 19
Amatur Rahman, et. al.Amatur Rahman ... Paul Medvedev
26 May 2024
Algorithms for Molecular Biology | VOL. 19

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT.
Andrea Cracco ... Alexandru I Tomescu
Genome research | VOL. 33
Andrea Cracco, et. al.Andrea Cracco ... Alexandru I Tomescu
30 May 2023
Genome research | VOL. 33

Building large updatable colored de Bruijn graphs via merging.
Martin D Muggli ... Christina Boucher
Bioinformatics (Oxford, England) | VOL. 35
Martin D Muggli, et. al.Martin D Muggli ... Christina Boucher
05 Jul 2019
Bioinformatics (Oxford, England) | VOL. 35

Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition
...
-
, et. al. ...
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compression Algorithm for Colored de Bruijn Graphs.

Abstract

Talk to us

Similar Papers

More From: LIPIcs : Leibniz international proceedings in informatics