Faster repetition-aware compressed suffix trees based on Block Trees

Manuel Cáceres,Gonzalo Navarro

doi:10.1016/j.ic.2021.104749

Manuel Cáceres, Gonzalo Navarro

Open Access

https://doi.org/10.1016/j.ic.2021.104749

Copy DOI

Abstract

The suffix tree is a fundamental data structure in stringology, but its space usage, though linear, is an important problem in applications like Bioinformatics. We design and implement a new compressed suffix tree (CST) targeted to highly repetitive texts, such as large genomic collections of the same species. Our first contribution is to enhance the Block Tree, a data structure that captures the repetitiveness of its input sequence, to represent the topology of trees with large repeated subtrees. Our so-called Block-Tree Compressed Topology (BT-CT) data structure augments the Block Tree nodes with data that speeds up tree navigation. Our Block-Tree CST (BT-CST), in turn, uses the BT-CT to compress the topology of the suffix tree, and also replaces the sampling of the suffix array and its inverse with grammar- and/or Block-Tree-based representations of those arrays.Our experimental results show that BT-CTs reach navigation speeds comparable to compact tree representations that are insensitive to repetitiveness, while using 2–10 times less space on the topologies of the suffix trees of repetitive collections. Our BT-CST is slightly larger than previous repetition-aware suffix trees based on grammar-compressed topologies, but outperforms them in time, often by orders of magnitude.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information and Computation	Publication Date: Apr 28, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Faster repetition-aware compressed suffix trees based on Block Trees

Abstract

Talk to us

Similar Papers

More From: Information and Computation

Lead the way for us

Similar Papers

Faster Repetition-Aware Compressed Suffix Trees Based on Block Trees
Manuel Cáceres ... Gonzalo Navarro
-
Manuel Cáceres, et. al.Manuel Cáceres ... Gonzalo Navarro
01 Jan 2019
01 Jan 2019

Fully compressed suffix trees
Luís M S Russo ... Arlindo L Oliveira
ACM Transactions on Algorithms | VOL. 7
Luís M S Russo, et. al.Luís M S Russo ... Arlindo L Oliveira
01 Sep 2011
ACM Transactions on Algorithms | VOL. 7

Faster entropy-bounded compressed suffix trees
Johannes Fischer ... Gonzalo Navarro
Theoretical Computer Science | VOL. 410
Johannes Fischer, et. al.Johannes Fischer ... Gonzalo Navarro
15 Sep 2009
Theoretical Computer Science | VOL. 410

Compressed suffix tree—a basis for genome-scale sequence analysis
N Valimaki ... W Gerlach
Bioinformatics | VOL. 23
N Valimaki, et. al.N Valimaki ... W Gerlach
19 Jan 2007
Bioinformatics | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Faster repetition-aware compressed suffix trees based on Block Trees

Abstract

Talk to us

Similar Papers

More From: Information and Computation