Abstract

In modern multi-tenant data centers, each tenant desires reassuring dependability from the virtualized network fabric – bandwidth guarantee with work conservation, bounded tail latency and resilient reachability. However, the slow convergence of prior works under network dynamics and uncertainties can hardly provide the dependability for tenants. Further, state-of-the-art load balance schemes are guarantee-agnostic and bring great risks on breaking bandwidth guarantee, which is overlooked in prior works. In this paper, we propose, a dependable virtualized fabric framework which can (1) quickly detect network failure in data plane, (2) explicitly select proper paths for all flows, and (3) converge to ideal bandwidth allocation at sub-millisecond. The core idea of is to leverage the programmable data plane to build a fusion of an active edge (e.g., NIC) and an informative core (e.g., switch), where the core sends link status and tenant information to the edge via telemetry to help the latter make a timely and accurate decision on path selection and traffic admission. We fully implement with commodity SmartNICs and programmable switches. Extensive evaluations show that can keep bandwidth guarantee with high bandwidth utilization, low and bounded latency, and resilient reachability under various network scenarios with limited overhead. Application-level experiments show that can improve QPS by 2.4 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> and cut tail latency by 10 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> compared to the alternatives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call