Abstract
Longest prefix matching (LPM) is a fundamental process in IP routing used not only in traditional hardware routers but also in software middleboxes. However, the performance of LPM in software is still insufficient for processing packets at over 100 Gbps, although previous studies have tackled this issue by exploiting the CPU cache or accelerators such as GPUs. To improve the performance of software LPM further, we propose a novel LPM method called Spider, which exploits a single-instruction multiple-data (SIMD) mechanism in the CPU. Spider achieves performing LPM for up to 16 destination IP address in parallel by a routing table structure carefully designed for processing by the SIMD instructions. We evaluated Spider from the following three perspectives: the improvement of LPM performance derived from the parallelism provided by the SIMD mechanism, performance comparison with other methods, and performance scalability. The evaluation shows that Spider dramatically improves the LPM performance, which reaches 1.8-3.2 times compared with the state-of-the-art methods. Moreover, Spider achieves 5,074 million lookups per second with 16 CPU cores, which is equivalent to the processing capacity of 3.4 Tbps in short packets; the performance opens up the possibility of packet processing at the terabit-class rate by software.
Highlights
Longest prefix matching (LPM) is a fundamental process of IP routing in both hardware routers and software middleboxes
Software LPM cannot deliver as much performance as hardware, software middleboxes are actively used for various use cases, e.g., network function virtualization (NFV) [1], [2], software routers for backbone networks [3], and software-defined WAN [4]
In this paper, we have proposed Spider, which achieves an improvement of the LPM performance by parallelizing its lookup procedure in a single CPU core
Summary
Longest prefix matching (LPM) is a fundamental process of IP routing in both hardware routers and software middleboxes. A major approach for fast LPM in software is to shorten the time for looking up a destination IP address by leveraging the CPU cache to minimize the latency for accessing data [5]–[12] Their performance has not reached the speed of a multiple of the 100 Gbps interfaces yet, their further performance improvement would be limited because they have thoroughly exploited the CPU cache, and so their remaining improvement factor is the increase of the CPU frequency, which has stagnated [13]. The evaluation is extended from our previous work to reveal more detailed characteristics of Spider, including the applicability to real-world packet processing applications (§ V-C), the CPU cycles to process the lookup procedure (§ V-D), and the performance under different CPU frequencies (§ V-E). The evaluation shows that Spider achieves major improvement (1.8–2.6 times for IPv4, and 2.2–3.2 times for IPv6) compared with the state-of-the-art methods and delivers the processing capacity of 34 ports of 100 Gbps interface with 16 CPU cores. The performance improvement of Spider opens up the possibility of packet processing at the terabit-class rate by software
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.