Optimistic regular expression matching on FPGAs for near-data processing

Andreas Becher,Jürgen Teich,Stefan Wildermann

doi:10.1145/3211922.3211926

Abstract

Regular expressions (regex) are the main means to search for specific patterns in the vast amount of stored textual information. As a consequence, different designs of hardware accelerators have been proposed that enable memory-bound regex processing. Here, the regular expression to be evaluated is translated to a non-deterministic (NFA) or deterministic finite automaton (DFA) which is then mapped onto the hardware design. The available hardware resources of the design imply the maximum size (in terms of amount of states and transitions) of the supported automata. However, regular expressions may be arbitrarily complex.As a remedy, we propose optimistic regular expression evaluation which follows the idea of pruning the DFA of a regex such that it fits the available hardware resources. Consequently, we obtain a DFA with not only matching states but also an uncertain state. Texts marked as uncertain have to be re-evaluated by software. This is particularly tailored to near-data processing where the optimistic regex evaluation is performed near the data source thus reducing the overall amount of data to be transmitted to the requesting host.A prototype is implemented within Google's RE2 meaning a complete coverage of RE2 supported regular expression for the proposed design. Regular expression evaluation of up to 2.66 GByte/s could be achieved on an FPGA-based Zynq SoC.

Full Text