Space-time tradeoff in regular expression matching with semi-deterministic finite automata

Yi-Hua E Yang,Viktor K Prasanna

doi:10.1109/infcom.2011.5934986

Abstract

Regular expression matching (REM) with nondeterministic finite automata (NFA) can be computationally expensive when a large number of patterns are matched concurrently. On the other hand, converting the NFA to a deterministic finite automaton (DFA) can cause state explosion, where the number of states and transitions in the DFA are exponentially larger than in the NFA. In this paper, we seek to answer the following question: to match an arbitrary set of regular expressions, is there a finite automaton that lies between the NFA and DFA in terms of computation and memory complexities? We introduce the semi-deterministic finite automata (SFA) and the state convolvement test to construct an SFA from a given NFA. An SFA consists of a fixed number (p) of constituent DFAs (c-DFA) running in parallel; each c-DFA is responsible for a subset of states in the original NFA. To match a set of regular expressions with n overlapping symbols (that can match to the same input character concurrently), the NFA can require O(n) computation per input character, whereas the DFA can have a state transition table with O(2n) states. By exploiting the state convolvements during the SFA construction, an equivalent SFA reduces the computation complexity to O(p2=c2) per input character while limiting the space requirement to O(|Σ|×p2×(n=p)c) states, where Σ is the alphabet and c ≥ 1 is a small design constant. Although the problem of constructing the optimal (minimum-sized) SFA is shown to be NP-complete, we develop a greedy heuristic to quickly construct a near-optimal SFA in time and space quadratic in the number of states in the original NFA. We demonstrate our SFA construction using real-world regular expressions taken from the Snort IDS.

Full Text