Abstract

Although Remote Direct Memory Access (RDMA) has become one of the most promising networking technologies in data centers, it is prone to link failures and several schemes have been proposed to utilize redundant network paths in RDMA for mitigating the problem. However, they still do not scale well since their schemes rely on a connection-based transport mode which uses more cache memory than a datagram-based mode in RNICs. In this paper, we propose a novel UD-based multi-path transport in RDMA (MPRUD). MPRUD not only addresses the scalability problem by employing UD QPs which occupy only a small amount of cache memory but also obtains robustness with failure detection and recovery algorithms. Our evaluation shows that MPRUD can achieve the line rate bandwidth with utilizing multi-paths and successfully recover from link failures about 72x faster compared to a default single-path flow, significantly decreasing data loss to 0.12% (from 4.5GB to 5.6MB).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call