Abstract

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log N) time and subsequently accessing in O(log D) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log N) time and accessing and moving the finger in O(log D + log log N) time. Compared to the best linear space solution to random access, we improve a O(log N) query bound to O(log D) for the static variant and to O(log D + log log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Highlights

  • Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes including the Lempel-Ziv family [49, 48, 46], Sequitur [35], Run-Length Encoding, Re-Pair [32], and many more [40, 20, 29, 30, 47, 4, 2, 3, 26]

  • The random access problem is one of the most basic primitives for computation on grammar compressed strings, and solutions to the problem are a key component in a wide range of algorithms and data structures for grammar compressed strings [9, 10, 21, 22, 23, 8, 28, 42, 43, 6]

  • In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i

Read more

Summary

Introduction

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes including the Lempel-Ziv family [49, 48, 46], Sequitur [35], Run-Length Encoding, Re-Pair [32], and many more [40, 20, 29, 30, 47, 4, 2, 3, 26]. The second variant is dynamic finger search, where we support a movefinger operation that updates the finger such that the update time depends on the distance the finger is moved. For the static finger search problem, we give an O(n) space representation that supports setfinger in O(log N ) time and access in O(log D) time, where D is the distance between the finger and the accessed index. Compared to our result we improve the O(log N ) bound to O(log D) for the static version and to O(log D + log log N ) for the dynamic version, while maintaining linear space These are the first non-trivial bounds for the finger search problems.

Related Work
Our results
Technical Overview
Longest Common Extensions
Preliminaries
Fringe Access
Data Structure
Improving the Query Time for Small Indices
Static Finger Search
Dynamic Finger Search
Left Heavy Path Decomposition of a Path
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call