Regular expression (RegEx) matching, the core operation of intrusion detection and prevention systems, remains a fundamentally challenging problem. A desired RegEx matching scheme should satisfy four requirements: deterministic finite state automata (DFA) speed, nondeterministic finite state automata (NFA) size, automated construction, and scalable construction. Despite lots of work on RegEx matching, no prior scheme satisfies all four of these requirements. In this paper, we approach this holy grail by proposing OverlayCAM, a RegEx matching scheme that satisfies all four requirements. The theoretical underpinning of our scheme is overlay delayed input DFA, a new automata model proposed in this paper that captures both state replication and transition replication, which are inherent in DFAs. Our RegEx matching solution processes one input character per lookup like a DFA, requires only the space of an NFA, is grounded in sound automata models, is easy to deploy in existing network devices, and comes with scalable and automated construction algorithms.
Read full abstract