Abstract

Let a text T[1..n] be the only string generated by a context-free grammar with g (terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of T, can be encoded using Glg⁡G bits. We introduce the first grammar-compressed index that uses O(Glg⁡n) bits (precisely, Glg⁡n+(2+ϵ)Glg⁡g for any constant ϵ>0) and can find the occ occurrences of patterns P[1..m] in time O((m2+occ)lg⁡G). We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call