Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

Farrukh Hijaz,Omer Khan,Peter Wilson,Brian Kahne

doi:10.1109/nas.2015.7255208

Abstract

Software IP forwarding routers provide flexibility, programmability and extensibility, while enabling fast deployment. The key question is whether they can keep up with the efficiency of special purpose hardware counterparts. Shared memory stands out as sine qua non for parallel programming of many commercial multicore processors, so it is the paradigm of choice to implement software routers. For efficiency, shared memory is often implemented with hardware support for cache coherence and data consistency among the cores. Although it enables efficient data access in many common case scenarios, the communication between cores using shared memory synchronization primitives often limits scalability. In this paper we perform a thorough characterization of a multithreaded packet processing application to quantify the opportunities from exploiting concurrency, as well as identify scalability bottlenecks in futuristic shared memory multicores. We propose to retain the shared memory model, however, introduce a set of lightweight in-hardware explicit messaging send/receive instructions in the instruction set architecture (ISA). These instructions are used to mitigate the overheads of multi-party communication in shared memory protocols. Using simulations of a 64 core multicore, we identify that scalability of parallel packet processing is limited due to packet ordering requirement that leads to expensive implicit communication under shared memory. Using explicit messaging support in the ISA, the communication bottleneck is mitigated, and the application scales to 30× at 64 cores.

Full Text