Abstract

Longest prefix matching (LPM) is a fundamental process in IP routing used not only in traditional hardware routers but also in software middleboxes. However, the performance of LPM in software is still insufficient for processing packets at over 100 Gbps, although previous studies have tackled this issue by exploiting the CPU cache or accelerators such as GPUs. To improve the performance of software LPM further, we propose a novel LPM method called Spider, which exploits a single-instruction multiple-data (SIMD) mechanism in the CPU. Spider achieves performing LPM for up to 16 destination IP address in parallel by a routing table structure carefully designed for processing by the SIMD instructions. We evaluated Spider from the following three perspectives: the improvement of LPM performance derived from the parallelism provided by the SIMD mechanism, performance comparison with other methods, and performance scalability. The evaluation shows that Spider dramatically improves the LPM performance, which reaches 1.8-3.2 times compared with the state-of-the-art methods. Moreover, Spider achieves 5,074 million lookups per second with 16 CPU cores, which is equivalent to the processing capacity of 3.4 Tbps in short packets; the performance opens up the possibility of packet processing at the terabit-class rate by software.

Highlights

  • Longest prefix matching (LPM) is a fundamental process of IP routing in both hardware routers and software middleboxes

  • Software LPM cannot deliver as much performance as hardware, software middleboxes are actively used for various use cases, e.g., network function virtualization (NFV) [1], [2], software routers for backbone networks [3], and software-defined WAN [4]

  • In this paper, we have proposed Spider, which achieves an improvement of the LPM performance by parallelizing its lookup procedure in a single CPU core

Read more

Summary

INTRODUCTION

Longest prefix matching (LPM) is a fundamental process of IP routing in both hardware routers and software middleboxes. A major approach for fast LPM in software is to shorten the time for looking up a destination IP address by leveraging the CPU cache to minimize the latency for accessing data [5]–[12] Their performance has not reached the speed of a multiple of the 100 Gbps interfaces yet, their further performance improvement would be limited because they have thoroughly exploited the CPU cache, and so their remaining improvement factor is the increase of the CPU frequency, which has stagnated [13]. The evaluation is extended from our previous work to reveal more detailed characteristics of Spider, including the applicability to real-world packet processing applications (§ V-C), the CPU cycles to process the lookup procedure (§ V-D), and the performance under different CPU frequencies (§ V-E). The evaluation shows that Spider achieves major improvement (1.8–2.6 times for IPv4, and 2.2–3.2 times for IPv6) compared with the state-of-the-art methods and delivers the processing capacity of 34 ports of 100 Gbps interface with 16 CPU cores. The performance improvement of Spider opens up the possibility of packet processing at the terabit-class rate by software

RELATED WORK
APPLICABILITY TO REAL PACKET PROCESSING
EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call