Abstract

Remote Direct Memory Access (RDMA) has been widely deployed in datacenters for its high performance. Large-scale high performance cloud services built on geographically distributed datacenters require long-range RDMA for performance requirements. However, existing RDMA solutions can hardly satisfy the stringent requirements of the emerging large-scale high-performance cloud services built on geo-distributed datacenters in terms of throughput and delay. On the one hand, lossless RDMA suffers from a deep buffer and potential suboptimal throughput for inter-datacenter traffic due to delayed response to Priority Flow Control (PFC) messages. On the other hand, lossy RDMA with selective retransmissions suffers from poor performance when multiple flows with different round-trip times (RTTs) coexist in cross-datacenter scenarios. This article proposes <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Swing</small> , which expands the high-performance lossless RDMA to long-distance links through PFC-Relay. <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Swing</small> ensures the throughput of long-distance links while minimizing the buffer requirement for long-range RDMA. It enables long-range RDMA without making any modifications to existing in-datacenter networks. The evaluation shows that <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Swing</small> can reduce the average flow completion time (FCT) by 14%-66% in a variety of traffic scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call