Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

Takuya Takagi,Shunsuke Inenaga,Keisuke Goto,Hiroki Arimura,Yuta Fujishige

doi:10.1007/978-3-319-67428-5_26

Abstract

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with $$O(\tilde{e}_T \log n)$$ bits of space allowing for $$O(\log n)$$ -time random and O(1)-time sequential accesses to edge labels, and $$O(m \log \sigma + occ)$$ -time pattern matching. Here, $$\tilde{e}_T$$ is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, $$\sigma $$ is the alphabet size, and $$ occ $$ is the number of occurrences of the pattern in T. The repetitiveness measure $$\tilde{e}_T$$ is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve $$O(m + occ )$$ pattern matching time with $$O(e_T^r \log n)$$ bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of $$\log \log n$$ , with the same space complexity. Here, $$e_T^r$$ is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size $$O(\tilde{e}_T)$$ for a given text T in $$O(n + \tilde{e}_T \log \sigma )$$ time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Elements of Timed Pattern Matching
Dogan Ulus ... Thomas Ferrère
ACM Transactions on Embedded Computing Systems | VOL. 23
Dogan Ulus, et. al.Dogan Ulus ... Thomas Ferrère
10 Jun 2024
ACM Transactions on Embedded Computing Systems | VOL. 23

Efficient Online Timed Pattern Matching by Automata-Based Skipping
Masaki Waga ... Ichiro Hasuo
-
Masaki Waga, et. al.Masaki Waga ... Ichiro Hasuo
01 Jan 2017
01 Jan 2017

A Boyer-Moore Type Algorithm for Timed Pattern Matching
Masaki Waga ... Takumi Akazaki
-
Masaki Waga, et. al.Masaki Waga ... Takumi Akazaki
01 Jan 2015
01 Jan 2015

Efficient LZ78 Factorization of Grammar Compressed Text
Hideo Bannai ... Masayuki Takeda
-
Hideo Bannai, et. al.Hideo Bannai ... Masayuki Takeda
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

Abstract

Talk to us

Similar Papers