Abstract

The scale of model parameters and datasets is rapidly growing for high accuracy in various areas. To train a large-scale deep neural network (DNN) model, a huge amount of computation and memory is required; therefore, a parallelization technique for training large-scale DNN models has attracted attention. A number of approaches have been proposed to parallelize large-scale DNN models, but these schemes lack scalability because of their long communication time and limited worker memory. They often sacrifice accuracy to reduce communication time. In this work, we proposed an efficient parallelism strategy named group hybrid parallelism (GHP) to minimize the training time without any accuracy loss. Two key ideas inspired our approach. First, grouping workers and training them by groups reduces unnecessary communication overhead among workers. It saves a huge amount of network resources in the course of training large-scale networks. Second, mixing data and model parallelism can reduce communication time and mitigate the worker memory issue. Data and model paralleism are complementary to each other so the training time can be enhanced when they are combined. We analyzed the training time model of the data and model parallelism, and based on the training time model, we demonstrated the heuristics that determine the parallelization strategy for minimizing training time. We evaluated group hybrid parallelism in comparison with existing parallelism schemes, and our experimental results show that group hybrid parallelism outperforms them.

Highlights

  • The deep-learning technique has received considerable attention for application in various areas such as medical imaging, space imaging, and VR/AR imaging

  • √ rate of O(1/ Bt + 1/t) at iteration t [39]. This is because the gradients obtained by group hybrid parallelism has the same value as the gradients acquired by minibatch stochastic gradient descent (SGD) with full batch B and it is proved in Proposition 1

  • In this paper, we addressed the limitations of existing parallelism schemes for training large-scale deep neural network (DNN) models

Read more

Summary

INTRODUCTION

The deep-learning technique has received considerable attention for application in various areas such as medical imaging, space imaging, and VR/AR imaging. Model parallelism does not require synchronization of parameters, and it resolves the worker memory limitation issue, but it provides low scalability because of low worker utilization and communication time of exchanging activation data. We propose a fast and scalable parallelism method of distributed SGD called group hybrid parallelism (GHP) for training large-scale DNN models. The key idea is that dividing workers into groups reduces activation size, so it mitigates both the communication time and device memory limitation, which arise during the training of a large-scale DNN model. We propose and evaluate parallelism scheme for fast and scalable training of large-scale DNNs. Our scheme optimally balances the data and model parallelism to minimize the training time and groups workers for scalability. We compare our work with other solutions and find out that our work outperforms others in terms of scalability and throughput

RELATED WORKS AND PROBLEM DESCRIPTION
SYSTEM OVERVIEW
CONVERGENCE ANALYSIS OF GROUP HYBRID PARALLELISM
DETERMINING SYSTEM CONFIGURATION FOR GROUP HYBRID PARALLELISM
WORKER ALLOCATION MODEL FOR GROUP HYBRID PARALLELISM
WORKER ALLOCATION
EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.