A formalisation of the theory of context-free languages in higher order logic

Aditi Barthwal

doi:10.25911/5d6e4c3c4b543

Abstract

We present a formalisation of the theory of context-free languages using the HOL4 theorem prover. The formalisation of this theory is not only interesting in its own right, but also gives insight into the kind of manipulations required to port a pen-and-paper proof to a theorem prover. The mechanisation proves to be an ideal case study of how intuitive textbook proofs can blow up in size and complexity, and how details from the textbook can change during formalisation. The mechanised theory provides the groundwork for our subsequent results about SLR parser generation. The theorems, even though well-established in the field, are interesting for the way they have to be “reproven” in a theorem prover. Proofs must be recast to be concrete enough for the prover: patching deductive gaps which are relatively easily grasped in a text proof, but beyond the automatic capabilities of contemporary tools. The library of proofs, techniques and notations developed here provides a basis from which further work on verified language theory can proceed at a quickened pace. We have mechanised classical results involving context-free grammars and pushdown automata. These include but are not limited to the equivalence between those two formalisms, the normalisation of CFGs, and the pumping lemma for proving a language is not context-free. As an application of this theory, we describe the verification of SLR parsing. Among the various properties proven about the parser we show, in particular, soundness: if the parser results in a parse tree on a given input, then the parse tree is valid with respect to the grammar, and the leaves of the parse tree match the input; and completeness: if the input belongs in the language of the grammar then the parser constructs the correct parse tree for the input with respect to the grammar. In addition, we develop a version of the algorithm that is executable by automatic translation from HOL to SML. This alternative version of the algorithm requires some interesting termination proofs. We conclude with a discussion of the issues we faced while mechanising pen-and-paper proofs. Carefully written formal proofs are regarded as rigorous for the audience they target. But when such proofs are implemented in a theorem prover, the level of detail required increases dramatically. We provide a discussion and a broad categorisation of the causes that give rise to this.

Full Text