Abstract

Regular Expression (RegEx) matching is the industry standard for Deep Packet Inspection (DPI) because RegExes are significantly more expressive than strings. To achieve high matching speed, we need to convert the RegExes to Deterministic Finite State Automata (DFA). However, DFA has the state explosion problem, that is, the number of DFA states and transitions can be exponential with the number of RegExes. Much work has addressed the DFA state explosion problem; however, none has met all the requirements of fast and automated construction, small memory image, and high matching speed. In this paper, we propose a decompositional approach, with fast and automated construction, small memory image, and high matching speed, to DFA state explosion. The first key idea is to decompose a complex RegEx that cause exponential state increases into a set of simpler RegExes that do not cause exponential state increases, where any character string that matches the complex RegEx also matches all the RegExes in the set of simpler RegExes; that is, the set of strings that match the complex RegEx is a subset of strings that match the set of simpler RegExes. The second key idea is to use a stateful post-processing engine to filter the matches that are actually the matches of the complex RegEx. Given an input string for matching, instead of using the large DFA constructed from the original complex RegEx to perform the matching, we first use the small DFA constructed from the set of simpler RegExes to perform the matching, and then, if the small DFA reports a match, we use the post-processing engine to determine whether it is a true match to the original complex RegEx. Because the pre-processing is simple, automaton construction can be automated and fast, and because most on-line processing is done by a DFA, its matching speed is close to that of a DFA alone. Our experimental results show that our decompositional approach achieves orders of magnitude faster DFA construction (in terms of seconds instead of minutes), 30 times smaller memory image, and 43% faster matching speeds, than state-of-the-art software based RegEx matching algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call