A high-throughput DPI engine on GPU via algorithm/implementation co-optimization

Cheng-Liang Hsieh,Lucas Vespa,Ning Weng

doi:10.1016/j.jpdc.2015.11.001

Abstract

The Graphics Processing Unit (GPU) is a promising platform to implement Deep Packet Inspection (DPI) due to the GPU’s rich parallelism and programmability for high performance and frequent pattern update requirements. However, it is a great challenge to achieve a high performance implementation due to the GPU’s performance sensitivity to algorithm and implementation issues such as memory overhead, thread divergence, and large lookup table sizes. In this paper, we propose algorithm and implementation co-optimization techniques that achieve high performance by reducing required memory, removing thread divergence, optimizing memory access patterns, and optimizing for multithreading. To lower the implementation cost, a GPU performance model is developed to detect the bottlenecks and provide design direction for the GPU kernel. Based on these optimization techniques, a prototype implementation of DPI at 150 Gb/s is achieved on a single NVIDIA K20 GPU.

Full Text