The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Rafael Carrascosa,François Coste,Matthias Gallé,Gabriel Infante-Lopez

doi:10.3390/a4040262

Abstract

The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.

Highlights

The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery.The size of a smallest grammar can be considered a computable variant of Kolmogorov complexity, in which the Turing machine description of the sequence is restricted to context-free grammars.The problem is decidable, but still hard: the problem of finding a smallest grammar with an is NP-HARD [1]
Note that no Iterative Repeat Replacement (IRR) algorithm could generate G∗ and, by enumeration we find that the smallest possible grammar that can be obtained with an IRR algorithm has size 46 + |Gmin (α)| + |Gmin (β)| +
We analyzed a new approach to the Smallest Grammar Problem, which consisted in optimizing separately the choice of words that are going to be constituents, and the choice of which occurrences of these constituents will be rewritten by non-terminals

Summary

Introduction

The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. In order to derive a score function corresponding to C OMPRESSIVE, note that replacing a word ω by a non-terminal results in a contraction of the grammar of (|ω|−1)∗oP (ω) and its inclusion in the grammar adds |ω| + 1 to the grammar size This defines f (ω, P) = fM C (ω, P) = (|ω| − 1) ∗ (oP (ω) − 1) − 2. Once an IRR algorithm has chosen a repeated word ω, it replaces all non-overlapping occurrences of that word in the current grammar by a new non-terminal N and adds N → ω to the set of production rules. If Q is a subset of the repeats of the sequence s, we denote by mgp({s} ∪ Q) the set of production rules P corresponding to one of the minimal grammar parsing of {s} ∪ Q. This means gaining 9 symbols and losing only 6 (because of the introduction of the new right-hand sides)

Experiments

Findings

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Oct 26, 2011
Citations: 37	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Choosing Word Occurrences for the Smallest Grammar Problem
Rafael Carrascosa ... Gabriel Infante-Lopez
-
Rafael Carrascosa, et. al.Rafael Carrascosa ... Gabriel Infante-Lopez
01 Jan 2009
01 Jan 2009

Elements of Information Theory
Thomas M Cover ... Joy A Thomas
-
Thomas M Cover, et. al.Thomas M Cover ... Joy A Thomas
01 Jan 1991
01 Jan 1991

Algorithmic Information Theory Using Kolmogorov Complexity
Ng Keng Meng
Journal of Applied & Computational Mathematics | VOL. 01`
Ng Keng MengNg Keng Meng
01 Jan 2012
Journal of Applied & Computational Mathematics | VOL. 01`

The Smallest Grammar Problem
M Charikar ... A Shelat
IEEE Transactions on Information Theory | VOL. 51
M Charikar, et. al.M Charikar ... A Shelat
01 Jul 2005
IEEE Transactions on Information Theory | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms