Abstract

Large-scale distributed machine learning (ML) systems need to transfer a large number of parameters, introducing high communication overhead which imposes a negative impact on the performance significantly. In recent years, the network interface cards of Ethernet and InfiniBand mostly support RDMA (Remote Direct Memory Access) technology, which provides superior communication performance, whereas, many of these systems such as MXNet has not benefited from the new technology yet. MXNet is one of these state-of-the-art distributed ML systems, and KVStore is its key module for parameter synchronization. In this paper we describe the design and implementation of RM-KVStore, a new highperformance RDMA-capable MXNet KVStore that exploits RDMA to improve throughput by providing memory pool and different mechanisms for small and large messages to achieve high time-efficient transfer. Experimental results show that, our RM-KVStore outperforms the KVStore over IPoIB and TCP/IP by 225% and 331% on average in terms of the push performance, respectively. And for the pull operation, RMKVStore achieves higher performance than the KVStore over IPoIB and TCP/IP by 141% and 247% on average, respectively. Consequently, the overall performance of MXNet is improved by 112% and 142% respectively compared with running on IPoIB and TCP/IP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call