Mealy machines are a better model of lexical analyzers

Wuu Yang

doi:10.1016/0096-0551(96)00003-3

Abstract

Lexical analyzers partition input characters into tokens. When ambiguities arise during lexical analysis, the longest-match rule is generally adopted to resolve the ambiguities. The longest-match rule causes the look-ahead problem in traditional lexical analyzers, which are based on Moore machines. In Moore machines, output tokens are associated with states of the automata. By contrast, because Mealy machines associate output tokens with state transitions, the look-ahead behaviors can be encoded in their state transition tables. Therefore, we believe that lexical analyzers should be based on Mealy machines, rather than Moore machines, in order to solve the look-ahead problem. We propose techniques to construct Mealy machines from regular expressions and to perform sequential and data-parallel lexical analysis with these Mealy machines.

Full Text