Abstract

Packet processing performance of Network Function Virtualization (NFV)-aware environment depends on the memory access performance of commercial-off-the-shelf (COTS) hardware systems. Table lookup is a typical example of packet processing, which has a significant dependence on memory access performance. Thus, the on-chip cache memories of the CPU are becoming more and more critical for many high-performance software routers or switches. Moreover, in the carrier network, multiple applications run on top of the same hardware system in parallel, which requires the capacity of cache memories. In this paper, we propose a packet processing architecture that enhances memory access parallelism by combining on-chip last-level-cache (LLC) slices and off-chip interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices. Table entries are stored in the off-chip 3D-stacked DRAM, so that memory requests are processed in parallel by using bank interleaving and channel parallelism. Also, cached entries are distributed to on-chip LLC slices according to a memory address-based hash function so that each CPU core can access on-chip LLC in parallel. The evaluation results show that the proposed architecture reduces the memory access latency by 62 % and 12 % and increases the throughput by 108 % and 2 % with reducing blocking probability of memory requests 96 % and 50 %, compared to the architecture with on-chip shared LLC and that without on-chip LLC, respectively.

Highlights

  • Packet processing performance of Network Function Virtualization (NFV)-aware environment depends on the memory access performance of commercial-off-the-shelf (COTS) hardware systems

  • Table entries are stored in the off-chip 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM), so that memory requests are processed in parallel by using bank interleaving and channel parallelism

  • The system model consists of a CPU equipped with six CPU cores, each of which has a queue in front of it and has dedicated level 1 (L1) and level 2 (L2) cache, a shared LLC with a queue in front of it, a 3D-stacked DRAM and its controller

Read more

Summary

INTRODUCTION

Packet processing performance of Network Function Virtualization (NFV)-aware environment depends on the memory access performance of commercial-off-the-shelf (COTS) hardware systems. These network functions consist of several packet processing elements such as parsing, classification, editing, and metering, each of which requires table lookup and memory accesses When these network functions are running on the same hardware system, usually called the multi-tenant environment, multiple applications issues many memory accesses from each corresponding CPU core in parallel. This situation requires both speed and capacity of cache memories for high-performance packet processing. There is no work that evaluates the performance dependency of the proposed architecture on the number of VOLUME 8, 2020 assigned resources when combining the LLC slices with 3D-stacked DRAM. This paper proposes a packet processing architecture that enhances memory access parallelism by combining on-chip LLC slices and off-chip 3D-stacked DRAM devices.

BACKGROUND
TRAFFIC MODEL
BLOCKING PROBABILITY AND AVERAGE WAITING
NUMERICAL SIMULATION RESULTS
RELATED WORK
DISCUSSION
Findings
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.