Abstract

With the increase of volume of training data and scale of network models, distributed deep learning is becoming more and more popular, which employs multiple workers to train a single model. However, communication among workers has been always a major challenge, because it may cause large time latency and bandwidth consumption. In this paper, we propose an entropy-based gradient compression (EGC) mechanism to reduce communication overhead. EGC selects the gradients communicated based on the entropy of the gradient items, which can achieve a high compression ratio without sacrificing accuracy. More importantly, EGC is a general and flexible mechanism that can be adopted in different distributed training algorithms. Accordingly, we propose three EGC-based training algorithms for different scenarios, i.e., EGC-DSGD for decentralized training, EGC-PS for centralized training, and EGC-FL for federated training. To improve the accuracy of these algorithms, we also adopt associated mechanisms, including automatic learning rate correction, momentum correction and residuals accumulation. We prove the convergence of EGC by analysis and evaluate its performance by experiments. Eight models are trained using popular public datasets (including MNIST, CIFAR-10, Tiny ImageNet and Penn Treebank) for the tasks of image classification and language modeling. The experimental results show that, compared with existing works, the EGC based algorithms can achieve roughly 1000 times gradient compression ratio while keeping the accuracy similar or even higher.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.