Standard Generalized Markup Language: Mathematical and philosophical issues

Derick Wood

doi:10.1007/bfb0015253

Abstract

The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that Backus-Naur Form is a metalanguage for defining programming-language grammars. Indeed, HTML, the method of marking up a hypertext documents for the World Wide Web, is an SGML grammar. The underlying assumptions of the SGML initiative are that a logical structure of a document can be identified and that it can be indicated by the insertion of labeled matching brackets (start and end tags). Moreover, it is assumed that the nesting relationships of these tags can be described with an extended context-free grammar (the right-hand sides of productions are regular expressions). In this survey of some of the issues raised by the SGML initiative, I reexamine the underlying assumptions and address some of the theoretical questions that SGML raises. In particular, I respond to two kinds of questions. The first kind are technical: Can we decide whether tag minimization is possible? Can we decide whether a proposed content model is legal? Can we remove exceptions in a structure preserving manner? Can we decide whether two SGML grammars are equivalent?

Full Text