Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence

Xiaodong Yu,Michela Becchi,Bill Lin

doi:10.1109/jsac.2014.2358840

Abstract

Regular expression matching, a central task in deep packet inspection and other networking applications, has been traditionally implemented through finite automata. Thanks to their limited per-character processing and memory bandwidth requirements, deterministic finite automata (DFA) are a natural choice for memory-based implementations. In the presence of large datasets of complex patterns, however, DFA suffer from the well-known state explosion problem. Specifically, state explosion can take place during DFA generation when the considered patterns contain bounded and unbounded repetitions of wildcards or large character sets. Several alternative FA representations have been proposed to address this problem. However, these proposals all suffer from one or more of the following problems: some can avoid state explosion only on datasets of limited size and complexity; some have prohibitive worst-case memory bandwidth requirements or processing time; and some can only guarantee functional equivalence for restricted classes of regular expressions and require the user to manually filter out unsupported patterns. In this work we propose JFA, a finite automation that uses state variables to avoid state explosion, and is functionally equivalent to the corresponding DFA. Functional equivalence is guaranteed by construction without requiring user intervention. We also provide optimization techniques to both limit the amount of state variables required and provide a lower bound for the JFA traversal time.

Full Text