FastUDP: a highly scalable user-level UDP framework in multi-core systems for fast packet I/O

Hongjun Zhang,Libo Zhang,Yanjun Wu,Heng Zhang

doi:10.1007/s11227-020-03486-6

Abstract

Nowadays, many applications, e.g., network routers, distributed data process engines, firewall, need to transfer packets at linear rate. With the increasing data volume, the performance of cluster in data center is suffering increasingly severe congestion problem of massive message packets. Constructing a high-performance stream methodology of massive small message packets is fundamentally challenging. Although many works have been proposed to address the shortcomings, inefficiency of sending massive small packets via UDP protocol in traditional Linux kernel implementation is persisting, which includes high overhead from socket operations, suboptimal scalability in multi-core systems, nonsupport of multiple network interface card (NIC) ports. In this paper, we present FastUDP, a highly efficient and scalable user-level UDP-based network stack optimization in multi-core systems. FastUDP addresses the inefficiencies from the following three novel designs: (1) enabling the exclusive thread model for improving scalability; (2) adopting a poll mode and batched operation for increasing computing resource utilization; (3) constructing a shared hugepage memory pool to eliminate the context switch overhead. Moreover, to support high throughput, FastUDP also proposes a novel work-queue-based approach to allow concurrent packet to transfer over multiple NIC ports. Based on a 40-core machine, the evaluation shows that FastUDP represents a significant improvement in the packet transfer throughput by up to 13× and reduces the packet transfer latency by up to 4.14× compared to the latest Linux (4.4.0) UDP stack. Besides, it ameliorates the performance of realistic application (memcached) by 36 to 67% compared to those on the Linux stack.

Full Text