Dynamic construction of pan-genome subgraphs

Kadir Dede,Enno Ohlebusch

doi:10.1515/comp-2020-0018

Kadir Dede, Enno Ohlebusch

Open Access

PDF Available

https://doi.org/10.1515/comp-2020-0018

Copy DOI

Export

Save

Cite

Journal: Open Computer Science	Publication Date: Apr 9, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: University of Ulm

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractMarcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph as a description of a pan-genome, comprising the genomes of many individuals/strains of the same or closely related species. Subsequent work improved the construction of the compressed de Bruijn graph in terms of run-time and memory consumption. According to the Computational Pan-Genomics Consortium (Briefings in Bioinformatics 2016), a pan-genome data structure should support the following functionality: “All information within a data structure should be easily accessible for human eyes by visualization support on different scales.” However, a pan-genome graph can have thousands to millions of nodes and such an amount of information is certainly not easily accessible for human eyes. Thus, the possibility to construct pangenome subgraphs on demand would be quite valuable. In this article, we use the space-efficient representation of the compressed de Bruijn graph devised by Beller and Ohle-busch (Algorithms for Molecular Biology 2016) to construct pan-genome subgraphs on the fly. The user can specify a region in one of the genomes and the software tool will build a subgraph that contains the path corresponding to that region and all paths that are in the neighborhood of that path. The size of the neighborhood can be controlled by the user.

Highlights

In the gene-based approach, one distinguishes between to use a compressed de Bruijn graph as a description the core genome that contains genes shared by all strains of a pan-genome, comprising the genomes of many inwithin the clade, the dispensdividuals/strains of the same or closely related species
A k-mer and its reverse complecompressed de Bruijn graph devised by Beller and Ohlement are not represented by the same node because we use busch (Algorithms for Molecular Biology 2016) to construct single strands
A build a subgraph that contains the path corresponding to bi-directed graph representation is required because it is that region and all paths that are in the neighborhood of not known a priori from which strand a read originated

Summary

The overall picture: constructing a compressed de Bruijn subgraph on demand

We have defined uncompressed and compressed de Bruijn graphs. Figure 1 shows the uncompressed de Bruijn graph of the strings S1 = ACGAATCACCAA, S2 = ACGAATCAGCAA, and S3 = GCGAATCTTTCTTTTCAA for k = 3, while Figure 2 shows the corresponding compressed graph. We define the compressed de Bruijn subgraph relative to R with depth d to be the compressed de Bruijn graph containing all nodes u satisfying dist(u, v) ≤ d for a node v ∈ R.4. 1. If there is a path of suitable length from u to a node v in R, u will be in the subgraph. 2. If there is a path of suitable length from a node v in R to u, u will not be in the subgraph (unless case applies). If there is a path of suitable length from a node v in R to u, u will not be in the subgraph (unless case applies) We use this definition of subgraph because our construction algorithm is based on a backward search procedure.. The memory requirements would be much higher because two index data structures are necessary: the wavelet tree of the BWT of S and the wavelet tree of the BWT of the reverse of S

Determining node identifiers with the two bit vectors B r and B l

Generating a node

Finding a predecessor node

Finding a path

Results

Discussion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Dynamic construction of pan-genome subgraphs

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Open Computer Science

Lead the way for us

Similar Papers

BioCarta
Darryl Nishimura
Biotech Software & Internet Report | VOL. 2
Darryl NishimuraDarryl Nishimura
01 Jun 2001
Biotech Software & Internet Report | VOL. 2

PangeBlocks: customized construction of pangenome graphs via maximal blocks
Jorge Avila Cartes ... Luca Denti
BMC Bioinformatics | VOL. 25
Jorge Avila Cartes, et. al.Jorge Avila Cartes ... Luca Denti
04 Nov 2024
BMC Bioinformatics | VOL. 25

Comparing methods for constructing and representing human pangenome graphs
Francesco Andreace ... Rayan Chikhi
Genome biology | VOL. 24
Francesco Andreace, et. al.Francesco Andreace ... Rayan Chikhi
30 Nov 2023
Genome biology | VOL. 24

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
Timo Beller ... Enno Ohlebusch
Algorithms for Molecular Biology | VOL. 11
Timo Beller, et. al.Timo Beller ... Enno Ohlebusch
18 Jul 2016
Algorithms for Molecular Biology | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Dynamic construction of pan-genome subgraphs

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Open Computer Science