Packet Processing Architecture With Off-Chip LLC Using Interleaved 3D-Stacked DRAM

Tomohiro Korikawa,Fujun He,Eiji Oki,Akio Kawabata

doi:10.1109/hpsr.2019.8807993

Abstract

The performance of packet processing applications is dependent on memory accesses speed of network systems. Table lookup requires fast memory accesses and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environment, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing over tens of Gbps. In addition, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require the capacity of cache memories as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vaults to utilize both bank interleaving and vault-level memory access parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in dedicated on-chip cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57 % and increases the throughput by 100 % with reducing blocking probability about 10 % compared to the conventional architecture with common on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing running on multiple CPU cores simultaneously.

Full Text