Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Xing Zhao,Aijun An,Junfeng Liu,Bao Xin Chen

doi:10.1109/icdcs.2019.00150

Abstract

Deep learning is a popular machine learning technique and has been applied to many real-world problems, ranging from computer vision to natural language processing. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) which improves the state-of-the-art Stale Synchronous Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular stalenes threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters (the weights of the model), and consequently increases the frequency of parameters updates (increases iteration through-put), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous Parallel (BSP), Asynchronous Parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Adaptive synchronous strategy for distributed machine learning
Miaoquan Tan ... Zhen‐Zheng Guo
International Journal of Intelligent Systems | VOL. 37
Miaoquan Tan, et. al.Miaoquan Tan ... Zhen‐Zheng Guo
20 Sep 2022
International Journal of Intelligent Systems | VOL. 37

Adaptive Asynchronous Parallelization of Graph Algorithms
Wenfei Fan ... Qiang Yin
ACM Transactions on Database Systems | VOL. 45
Wenfei Fan, et. al.Wenfei Fan ... Qiang Yin
30 Jun 2020
ACM Transactions on Database Systems | VOL. 45

Adaptive Asynchronous Parallelization of Graph Algorithms
Wenfei Fan ... Qiang Yin
-
Wenfei Fan, et. al.Wenfei Fan ... Qiang Yin
27 May 2018
27 May 2018

Distributed Machine Learning based Mitigating Straggler in Big Data Environment
Haodong Lu ... Kun Wang
-
Haodong Lu, et. al.Haodong Lu ... Kun Wang
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Abstract

Talk to us

Similar Papers