Grouping Synchronous to Eliminate Stragglers with Edge Computing in Distributed Deep Learning

Zhiyi Gui,Haifeng Sun,Wei Li,Hao Yang,Xiang Yang,Jingyu Wang,Jianxin Liao,Qi Qi,Lei Zhang

doi:10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00066

Zhiyi Gui, Haifeng Sun + Show 7 more

https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00066

Copy DOI

Abstract

With the development of artificial intelligence(AI) applications, a large number of data are generated from mobile or IoT devices at the edge of the network. Deep learning tasks are executed to obtain effective information in the user data. However, the edge nodes are heterogeneous and the network bandwidth is limited in this case, which will cause general distributed deep learning to be inefficient. In this paper, we propose Group Synchronous Parallel (GSP), which uses a density-based algorithm to group edge nodes with similar training speeds together. In order to eliminate stragglers, group parameter servers are responsible for coordinating communication of nodes in the group with Stale Synchronous Parallel and aggregating the gradients of these nodes. And a global parameter server is responsible for aggregating the gradients from the group parameter servers to update the global model. To save network bandwidth, we further propose Grouping Dynamic Sparsification (GDS). It adjusts the gradient sparsification rate of nodes dynamically based on GSP so as to differentiates the communication volume and makes the training speed of all nodes tend to be the same. We evaluate GSP and GDS’s performance on LeNet-5, ResNet, VGG, and Seq2Seq with Attention. The experimental results show that GSP speedups the training by 45% ~ 120% with 16 nodes. GDS on top of GSP can make up for some test accuracy loss, up to 0.82% for LeNet-5.

Full Text