Abstract

With more and more large-scale scientific facilities are built, more and more HPC requirements are needed in IHEP. RDMA is a technology that allows servers in a network to exchange data in main memory without involving the processor, cache or operating system of either server, which can provide high bandwidth and low latency. There are two RDMA technologies which were InfiniBand and a relative newcomer called RoCE – RDMA over Converged Ethernet. This paper introduces the RoCE technology, we research and compare the performance of both IB and RoCE in IHEP data center, and we also evaluate the application scenarios of RoCE which can support our future technology selection in HEPS. In the end, we present our future plan.

Highlights

  • Modern data centers are tasked with delivering intelligent multi-media responses to realtime human interactions

  • RDMA over Converged Ethernet (RoCE) is a network protocol defined in the InfiniBand Trade Association (IBTA) standard, allowing remote direct memory access (RDMA) over converged Ethernet network

  • RoCE v2: The RoCE v2 protocol overcomes the limitation of version 1 being bounded to a single broadcast domain (VLAN)

Read more

Summary

Introduction

Modern data centers are tasked with delivering intelligent multi-media responses to realtime human interactions. Generalized cloud infrastructure is being deployed in the data center of IHEP. The key to advancing cloud infrastructure to the level is the elimination of loss in the network, which include not just packet loss, and throughput loss and latency loss. There should be no loss in the data center network. New advancements in high-speed distributed solid-state storage, coupled with remote direct memory access (RDMA) and new networking technologies to better manage congestion, are allowing these parallel environments to run atop more generalized generation cloud infrastructure. RDMA is a technology that allows servers in a network to exchange data in main memory without involving the processor, cache or operating system of either server, which can provide high bandwidth and low latency[1]. We compare the two technologies in architecture and do some performance evaluation on the general MPI (Message Passing interface) scenarios

InfiniBand
Experimental Setup
Latency
Future Work
Mellanox
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call