Grammar-based Compression Research Articles

Many problems in interprocedural program analysis can be modeled as the context-free language (CFL) reachability problem on graphs and can be solved in cubic time. Despite years of efforts, there are no known truly sub-cubic algorithms for this problem. We study the related certification task: given an instance of CFL reachability, are there small and efficiently checkable certificates for the existence and for the non-existence of a path? We show that, in both scenarios, there exist succinct certificates ( O ( n 2 ) in the size of the problem) and these certificates can be checked in subcubic (matrix multiplication) time. The certificates are based on grammar-based compression of paths (for reachability) and on invariants represented as matrix inequalities (for non-reachability). Thus, CFL reachability lies in nondeterministic and co-nondeterministic subcubic time. A natural question is whether faster algorithms for CFL reachability will lead to faster algorithms for combinatorial problems such as Boolean satisfiability (SAT). As a consequence of our certification results, we show that there cannot be a fine-grained reduction from SAT to CFL reachability for a conditional lower bound stronger than n ω , unless the nondeterministic strong exponential time hypothesis (NSETH) fails. In a nutshell, reductions from SAT are unlikely to explain the cubic bottleneck for CFL reachability. Our results extend to related subcubic equivalent problems: pushdown reachability and 2NPDA recognition; as well as to all-pairs CFL reachability. For example, we describe succinct certificates for pushdown non-reachability (inductive invariants) and observe that they can be checked in matrix multiplication time. We also extract a new hardest 2NPDA language, capturing the “hard core” of all these problems.

Read full abstract

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log N) time and subsequently accessing in O(log D) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log N) time and accessing and moving the finger in O(log D + log log N) time. Compared to the best linear space solution to random access, we improve a O(log N) query bound to O(log D) for the static variant and to O(log D + log log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Read full abstract

Grammar-based Compression Research Articles

Related Topics

Articles published on Grammar-based Compression

Survey of Grammar-Based Data Structure Compression

GRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePair.

Subcubic certificates for CFL reachability

A Compact Representation of Indoor Trajectories

Lempel-Ziv Parsing for Sequences of Blocks

Grammar-based compression and its use in symbolic music analysis

An investigation of music analysis by the application of grammar-based compressors

Balancing Straight-line Programs

The Smallest Grammar Problem Revisited

Faster & strong: string dictionary compression using sampling and fast vectorized decompression

Grammar-Based Compression of Unranked Trees

RePair and All Irreducible Grammars are Upper Bounded by High-Order Empirical Entropy

A separation between RLSLPs and LZ77

Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs

Finger Search in Grammar-Compressed Strings

Using Adaptive Automata in Grammar Based Text Compression to Identify Frequent Substrings

Approximation of grammar-based compression via recompression

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Random Access to Grammar-Compressed Strings and Trees

XPath Node Selection over Grammar-Compressed Trees

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Grammar-based Compression Research Articles

Related Topics

Articles published on Grammar-based Compression

Survey of Grammar-Based Data Structure Compression

GRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePair.

Subcubic certificates for CFL reachability

A Compact Representation of Indoor Trajectories

Lempel-Ziv Parsing for Sequences of Blocks

Grammar-based compression and its use in symbolic music analysis

An investigation of music analysis by the application of grammar-based compressors

Balancing Straight-line Programs

The Smallest Grammar Problem Revisited

Faster & strong: string dictionary compression using sampling and fast vectorized decompression

Grammar-Based Compression of Unranked Trees

RePair and All Irreducible Grammars are Upper Bounded by High-Order Empirical Entropy

A separation between RLSLPs and LZ77

Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs

Finger Search in Grammar-Compressed Strings

Using Adaptive Automata in Grammar Based Text Compression to Identify Frequent Substrings

Approximation of grammar-based compression via recompression

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Random Access to Grammar-Compressed Strings and Trees

XPath Node Selection over Grammar-Compressed Trees