Parsing of General Context-Free Languages

Susan L Graham,Michael A Harrison

doi:10.1016/s0065-2458(08)60451-9

Abstract

One of the major advances both in the study of natural languages and in the use of newly defined languages, such as programming languages, came with the realization that one required a formal and precise mechanism for generating the infinite set of strings of a language. Both programming linguists and natural linguists independently formulated the notion of a context-free grammar as an important generative schema. This chapter focuses on this recognition problem and its related problem of “parsing,” which means to find a derivation tree of a string in the language. A variety of methods are now known for parsing classes of context-free grammars. In some sense, the crudest method is systematic trial-and-error—that is, a deterministic simulation of the nondeterministic choice of next steps in a derivation. However, such a simulation can require a number of steps, which is exponential in the length of the string being analyzed. The chapter focuses its attention on those classes of grammars that are rich enough to generate all the context-free languages. It concentrates on three algorithms for parsing classes of context-free grammars. It shows that each method parses a class of grammars sufficiently large to generate all the context-free languages. Furthermore, each method has a time bound, which is shown to be at worst cubic in the length of the string being parsed. The three methods are presented within a consistent framework and notation so that it is possible to understand both their similarities and their differences.

Full Text