Abstract

In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an mathcal {O}(3^{mid {w}mid }) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the “hierarchical depth” of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.

Highlights

  • Context-free grammars are among the most classical concepts in theoretical computer science

  • From a formal languages point of view, describing a single word by a context-free grammar seems excessive, there are at least two evident motivations: – Compression Perspective:2 The grammar G is a compressed representation of the word w. – Inference Perspective: The grammar G identifies the hierarchical structure of the word w

  • The inference perspective of computing grammars for single words has been applied in two more PhD-theses, namely by de Marcken [3] in order to investigate whether analysing the structure of small grammars for large English texts could help understanding the structure of the language itself, and by Galle [4] in order to infer hierarchical structures in DNA

Read more

Summary

Introduction

Context-free grammars are among the most classical concepts in theoretical computer science. We are concerned with grammars G that describe singleton languages {w} (or, by slightly abusing notation, grammars describing single words).

Grammars as Inference Tools and Compressors
Algorithmics on Compressed Strings
The Smallest Grammar Problem
Our Contribution
Outline of the Paper
Preliminaries
Basic Concepts of Graph Theory and Complexity Theory
Grammars
Examples
NP-Hardness of Computing Smallest Grammars for Fixed Alphabets
The 1-Level Case
The Multi-Level Case
Extensions of the Reductions
Smallest Grammars with a Bounded Number of Nonterminals
Related Questions
Exact Exponential-Time Algorithms
Small Alphabets
Approximation
Parameterised Complexity
Findings
A More Abstract View
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.