Abstract The Tomita’s Generalized LR ( 1 ) parsing algorithm (GLR), later improved in many ways, runs in a linear time on LR ( 1 ) grammars, and degrades to a polynomial-time bound if the grammar is not deterministic. We address a useful feature not present in the current GLR ( 1 ) methods: the ability to accept grammars of the Extended BNF type (EBNF), the rules of which contain regular expressions. An EBNF grammar is conveniently represented by a collection of finite automata called a Transition Net (TN). We define, analyze and evaluate a new GLR ( 1 ) algorithm, called GELR, that combines the recent LR ( 1 ) parsing algorithm for TNs with the classical GLR data structures: the Graph-Structured Stack representing multiple stacks, and the Shared Packed Parse Forest for multiple syntax trees. The GELR algorithm is proved correct and an efficient implementation incorporating the state-of-the-art Right-Nulled parsing optimization is available. Experimental measures of the GELR parser size, speed and memory footprint are reported for current programming and web languages, and are compared with those of other parsing algorithms. The findings prove that directly parsing EBNF grammars does not penalize speed. Performance comparisons for different computer languages should also be of interest.
Read full abstract