Re-Pair in Small Space

Dominik Köppl,Yoshimasa Takabatake,Keisuke Goto,Tomohiro I,Kensuke Sakai,Isamu Furuya

doi:10.3390/a14010005

Abstract

Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size σ=nO(1), an O(min(n2,n2lglogτnlglglgn/logτn)) time algorithm computing Re-Pair with max((n/c)lgn,nlgτ)+O(lgn) bits of working space including the text space, where c≥1 is a fixed user-defined constant and τ is the sum of σ and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.

Highlights

Re-Pair [1] is a grammar deriving a single string
Besides the seminal work of Larsson and Moffat [1], there are a couple of articles devoted to the compression aspects of Re-Pair: Given a text T of length n whose characters are drawn from an integer alphabet of size σ := nO(1), the output of Re-Pair applied to T is at most 2nHk ( T ) + o (n lg σ ) bits with k = o when represented naively as a list of character pairs [2], where Hk denotes the empirical entropy of the k-th order
We focus on the problem of computing the grammar with an algorithm working in text space, forming a bridge between the domain of in-place string algorithms, low-memory compression algorithms, and the domain of Re-Pair computing algorithms

Summary

Introduction

Re-Pair [1] is a grammar deriving a single string. It is computed by replacing the most frequent bigram in this string with a new non-terminal, recursing until no bigram occurs more than once. Re-Pair is a so-called irreducible grammar, its grammar size, i.e., the sum of the symbols on the right-hand side of all rules, is upper bounded by O(n/ logσ n) ([3], Lemma 2), which matches the information-theoretic lower bound on the size of a grammar for a string of length n. Charikar et al [6] (Section G) gave an easy variation to improve the size of the grammar Another variant, proposed by Claude and Navarro [12], runs in a user-defined working space (>n lg n bits) and shares with our proposed solution the idea of a table that (a) is stored with the text in the working space and (b) grows in rounds. Furuya et al [18] presented a variant, called MR-Re-Pair, in which a most frequent maximal repeat is replaced instead of a most frequent bigram

Related Work

Our Contribution

Preliminaries

Sequential Algorithm

Trade-Off Computation

Algorithmic Ideas

Algorithmic Details

Storing the Output In-Place

Step-by-Step Execution

Implementation

Bit-Parallel Algorithm

Broadword Search

Bit-Parallel Adaption

Computing MR-Re-Pair in Small Space

Parallel Algorithm

Computing Re-Pair in External Memory

Heuristics for Practicality

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Re-Pair in Small Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Journal: Algorithms	Publication Date: Dec 25, 2020
License type: CC BY 4.0

Similar Papers

Re-Pair in Small Space
Dominik Koppl ... Keisuke Goto
-
Dominik Koppl, et. al.Dominik Koppl ... Keisuke Goto
01 Mar 2020
01 Mar 2020

Join Algorithms: From External Memory to the BSP

-

01 Jan 2018
01 Jan 2018

Parallel algorithms in external memory.
David Hutchinson
-
David HutchinsonDavid Hutchinson
04 Oct 2018
04 Oct 2018

Reducing I/O complexity by simulating coarse grained parallel algorithms
F Dehne ... W Dittrich
-
F Dehne, et. al.F Dehne ... W Dittrich
12 Apr 1999
12 Apr 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Re-Pair in Small Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms