SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks

Shinyoung Ahn,Eunji Lim

doi:10.1109/access.2020.3038112

Shinyoung Ahn, Eunji Lim

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3038112

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Distributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. The communication bottleneck increases training time and decreases the utilization of the computational resources. Our previous study, SoftMemoryBox (SMB1) presented considerably superior performance compared to message passing interface (MPI) in the parameter communication of distributed DNN training. However, SMB1 had disadvantages such as the limited scalability of the distributed DNN training due to the restricted communication bandwidth from a single memory server, inability to provide a synchronization function for the shared memory buffer, and low portability/usability as a consequence of the kernel-level implementation. This paper proposes a scalable, shared memory buffer framework, called SoftMemoryBox II (SMB2), which overcomes the shortcomings of SMB1. With SMB2, distributed training processes can easily share virtually unified shared memory buffers composed of memory segments provided from remote memory servers and can exchange DNN parameters at high speed through the shared memory buffer. The scalable communication bandwidth of the SMB2 framework facilitates the reduction of DNN distributed training times compared to SMB1. According to intensive evaluation results, the communication bandwidth of the proposed SMB2 is 6.3 times greater than that of SMB1 when the SMB2 framework is scaled out to use eight memory servers. Moreover, the training time of SMB2-based asynchronous distributed training of five DNN models is up to 2.4 times faster than SMB1-based training.

Highlights

Deep learning is currently implemented in numerous application domains including face recognition, image classification, object detection, visual relationship detection, speech recognition, and security [1]–[6]
We proved the advantage of SMB1 by comparing the computation and communication time of different deep neural networks (DNNs) by emulating distributed deep learning parameter communication using SMB1 and message passing interface (MPI)
In this paper, we proposed a scalable, shared memory buffer framework called SMB2, which can be used as a replacement for the parameter server for asynchronous distributed DNN training

Summary

INTRODUCTION

Deep learning is currently implemented in numerous application domains including face recognition, image classification, object detection, visual relationship detection, speech recognition, and security [1]–[6]. When using a traditional message communication protocol (e.g., message passing interface (MPI)) based on TCP/IP, the parameter communication bottleneck becomes severe, and the communication bottleneck increases the idle time of the computational resources such as GPUs and reduces the resource utilization To address this communication bottleneck issue in distributed DNN training, we proposed SoftMemoryBox (SMB1), a virtual shared memory framework in previous work, where we presented the concept, architecture, components, and basic application programming interface (API) of SMB1 [16]. Instead of a well-known synchronization function, SMB1 supports a restricted method for parameter update, cumulative API, which provides an exclusive accumulation function between two shared memory buffers with the assistance of the SMB server This requires allocating extra memory buffers to be used to stage the temporal data (e.g., weight difference to be updated).

RELATED WORK

FIRST EXPERIMENT

SECOND EXPERIMENT

THIRD EXPERIMENT

FOURTH EXPERIMENT

Findings

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
Bontak Gu ... Arslan Munir
-
Bontak Gu, et. al.Bontak Gu ... Arslan Munir
01 Dec 2019
01 Dec 2019

DLB: A Dynamic Load Balance Strategy for Distributed Training of Deep Neural Networks
Qing Ye ... Mingjia Shi
IEEE Transactions on Emerging Topics in Computational Intelligence | VOL. 7
Qing Ye, et. al.Qing Ye ... Mingjia Shi
01 Aug 2023
IEEE Transactions on Emerging Topics in Computational Intelligence | VOL. 7

Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment
Binbin Huang ... Shuiguang Deng
Computer Communications | VOL. 215
Binbin Huang, et. al.Binbin Huang ... Shuiguang Deng
30 Dec 2023
Computer Communications | VOL. 215

Huffman Coding Based Encoding Techniques for Fast Distributed Deep Learning
Rishikesh R Gajjala ... Ahmed M Abdelmoniem
-
Rishikesh R Gajjala, et. al.Rishikesh R Gajjala ... Ahmed M Abdelmoniem
01 Dec 2020
01 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access