Accelerating Distributed SGD With Group Hybrid Parallelism

Kyung-No Joo,Chan-Hyun Youn

doi:10.1109/access.2021.3070012

Kyung-No Joo, Chan-Hyun Youn

Open Access

https://doi.org/10.1109/access.2021.3070012

Copy DOI

Abstract

The scale of model parameters and datasets is rapidly growing for high accuracy in various areas. To train a large-scale deep neural network (DNN) model, a huge amount of computation and memory is required; therefore, a parallelization technique for training large-scale DNN models has attracted attention. A number of approaches have been proposed to parallelize large-scale DNN models, but these schemes lack scalability because of their long communication time and limited worker memory. They often sacrifice accuracy to reduce communication time. In this work, we proposed an efficient parallelism strategy named group hybrid parallelism (GHP) to minimize the training time without any accuracy loss. Two key ideas inspired our approach. First, grouping workers and training them by groups reduces unnecessary communication overhead among workers. It saves a huge amount of network resources in the course of training large-scale networks. Second, mixing data and model parallelism can reduce communication time and mitigate the worker memory issue. Data and model paralleism are complementary to each other so the training time can be enhanced when they are combined. We analyzed the training time model of the data and model parallelism, and based on the training time model, we demonstrated the heuristics that determine the parallelization strategy for minimizing training time. We evaluated group hybrid parallelism in comparison with existing parallelism schemes, and our experimental results show that group hybrid parallelism outperforms them.

Highlights

The deep-learning technique has received considerable attention for application in various areas such as medical imaging, space imaging, and VR/AR imaging
√ rate of O(1/ Bt + 1/t) at iteration t [39]. This is because the gradients obtained by group hybrid parallelism has the same value as the gradients acquired by minibatch stochastic gradient descent (SGD) with full batch B and it is proved in Proposition 1
In this paper, we addressed the limitations of existing parallelism schemes for training large-scale deep neural network (DNN) models

Summary

INTRODUCTION

The deep-learning technique has received considerable attention for application in various areas such as medical imaging, space imaging, and VR/AR imaging. Model parallelism does not require synchronization of parameters, and it resolves the worker memory limitation issue, but it provides low scalability because of low worker utilization and communication time of exchanging activation data. We propose a fast and scalable parallelism method of distributed SGD called group hybrid parallelism (GHP) for training large-scale DNN models. The key idea is that dividing workers into groups reduces activation size, so it mitigates both the communication time and device memory limitation, which arise during the training of a large-scale DNN model. We propose and evaluate parallelism scheme for fast and scalable training of large-scale DNNs. Our scheme optimally balances the data and model parallelism to minimize the training time and groups workers for scalability. We compare our work with other solutions and find out that our work outperforms others in terms of scalability and throughput

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 35	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accelerating Distributed SGD With Group Hybrid Parallelism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Visual Diagnostics of Parallel Performance in Training Large-Scale DNN Models.
Yating Wei ... Yue Wang
IEEE transactions on visualization and computer graphics | VOL. 30
Yating Wei, et. al.Yating Wei ... Yue Wang
01 Jan 2024
IEEE transactions on visualization and computer graphics | VOL. 30

APapo: An asynchronous parallel optimization method for DNN models
Shuai Liu ... Tao Ju
Future Generation Computer Systems | VOL. 152
Shuai Liu, et. al.Shuai Liu ... Tao Ju
11 Nov 2023
Future Generation Computer Systems | VOL. 152

A Robustness-Assured White-Box Watermark in Neural Networks
Peizhuo Lv ... Ruigang Liang
IEEE Transactions on Dependable and Secure Computing | VOL. 20
Peizhuo Lv, et. al.Peizhuo Lv ... Ruigang Liang
01 Nov 2023
IEEE Transactions on Dependable and Secure Computing | VOL. 20

Interactive visual analytics of parallel training strategies for DNN models
Zhongwei Wang ... Wei Chen
Computers & Graphics | VOL. 115
Zhongwei Wang, et. al.Zhongwei Wang ... Wei Chen
17 Jul 2023
Computers & Graphics | VOL. 115

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating Distributed SGD With Group Hybrid Parallelism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access