EGC: Entropy-based gradient compression for distributed deep learning

Danyang Xiao,Yuan Mei,Di Kuang,Mengqiang Chen,Binbin Guo,Weigang Wu

doi:10.1016/j.ins.2020.05.121

Abstract

With the increase of volume of training data and scale of network models, distributed deep learning is becoming more and more popular, which employs multiple workers to train a single model. However, communication among workers has been always a major challenge, because it may cause large time latency and bandwidth consumption. In this paper, we propose an entropy-based gradient compression (EGC) mechanism to reduce communication overhead. EGC selects the gradients communicated based on the entropy of the gradient items, which can achieve a high compression ratio without sacrificing accuracy. More importantly, EGC is a general and flexible mechanism that can be adopted in different distributed training algorithms. Accordingly, we propose three EGC-based training algorithms for different scenarios, i.e., EGC-DSGD for decentralized training, EGC-PS for centralized training, and EGC-FL for federated training. To improve the accuracy of these algorithms, we also adopt associated mechanisms, including automatic learning rate correction, momentum correction and residuals accumulation. We prove the convergence of EGC by analysis and evaluate its performance by experiments. Eight models are trained using popular public datasets (including MNIST, CIFAR-10, Tiny ImageNet and Penn Treebank) for the tasks of image classification and language modeling. The experimental results show that, compared with existing works, the EGC based algorithms can achieve roughly 1000 times gradient compression ratio while keeping the accuracy similar or even higher.

Full Text