Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

Jiachen Xue,T N Vijaykumar,Muhammad Usama Chaudhry,Balajee Vamanan,Mithuna Thottethodi

doi:10.1109/tnet.2019.2961671

Abstract

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5x) and 79% (4.8x) lower 99th-percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Networking	Publication Date: Feb 1, 2020
Citations: 67	License type: cc-by

R Discovery Prime

R Discovery Prime

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Networking

Lead the way for us

Similar Papers

PACC: Proactive and Accurate Congestion Feedback for RDMA Congestion Control
Xiaolong Zhong ... Zirui Wan
-
Xiaolong Zhong, et. al.Xiaolong Zhong ... Zirui Wan
02 May 2022
02 May 2022

Microburst Aware Congestion Control for Storage Traffic
Osamu Shiraki
-
Osamu ShirakiOsamu Shiraki
01 Apr 2019
01 Apr 2019

COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign
Ke Wu ... Weixia Xu
ACM Transactions on Architecture and Code Optimization | VOL. -
Ke Wu, et. al.Ke Wu ... Weixia Xu
22 Apr 2024
ACM Transactions on Architecture and Code Optimization | VOL. -

Traffic Control for RDMA-Enabled Data Center Networks: A Survey
Zehua Guo ... Sen Liu
IEEE Systems Journal | VOL. 14
Zehua Guo, et. al.Zehua Guo ... Sen Liu
19 Sep 2019
IEEE Systems Journal | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Networking