Self-Indexed Grammar-Based Compression

Francisco Claude,Gonzalo Navarro

doi:10.3233/fi-2011-565

Abstract

Self-indexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current self-indexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several applications. Grammar-based compression is well suited to exploit such repetitiveness. We introduce the first grammar-based self-index. It builds on Straight-Line Programs (SLPs), a rather general kind of context-free grammars. If an SLP of n rules represents a text T[1, u], then an SLP-compressed representation of T requires 2n log 2 n bits. For that same SLP, our self-index takes O(n log n) + n log 2 u bits. It extracts any text substring of length m in time O((m + h) log n), and finds occ occurrences of a pattern string of length m in time O((m(m + h) + h occ) log n), where h is the height of the parse tree of the SLP. No previous grammar representation had achieved o(n) search time. As byproducts we introduce (i) a representation of SLPs that takes 2n log 2 n(1 + o(1)) bits and efficiently supports more operations than a plain array of rules; (ii) a representation for binary relations with labels supporting various extended queries; (iii) a generalization of our self-index to grammar compressors that reduce T to a sequence of terminals and nonterminals, such as Re-Pair and LZ78.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Indexed Grammar-Based Compression

Abstract

Talk to us

Similar Papers

More From: Fundamenta Informaticae

Lead the way for us

Journal: Fundamenta Informaticae	Publication Date: Jan 1, 2011
Citations: 136

Similar Papers

A separation between RLSLPs and LZ77
Philip Bille ... Nicola Prezza
Journal of Discrete Algorithms | VOL. 50
Philip Bille, et. al.Philip Bille ... Nicola Prezza
01 May 2018
Journal of Discrete Algorithms | VOL. 50

Compression by Contracting Straight-Line Programs

-

01 Jul 2021
01 Jul 2021

Faster Fully Compressed Pattern Matching by Recompression
Artur Jeż
-
Artur JeżArtur Jeż
01 Jan 2012
01 Jan 2012

Faster Fully Compressed Pattern Matching by Recompression
Artur Jeż
ACM Transactions on Algorithms | VOL. 11
Artur JeżArtur Jeż
13 Jan 2015
ACM Transactions on Algorithms | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Indexed Grammar-Based Compression

Abstract

Talk to us

Similar Papers

More From: Fundamenta Informaticae