Abstract

The Graphics Processing Unit (GPU) is a promising platform to implement Deep Packet Inspection (DPI) due to the GPU’s rich parallelism and programmability for high performance and frequent pattern update requirements. However, it is a great challenge to achieve a high performance implementation due to the GPU’s performance sensitivity to algorithm and implementation issues such as memory overhead, thread divergence, and large lookup table sizes. In this paper, we propose algorithm and implementation co-optimization techniques that achieve high performance by reducing required memory, removing thread divergence, optimizing memory access patterns, and optimizing for multithreading. To lower the implementation cost, a GPU performance model is developed to detect the bottlenecks and provide design direction for the GPU kernel. Based on these optimization techniques, a prototype implementation of DPI at 150 Gb/s is achieved on a single NVIDIA K20 GPU.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.