Abstract

Distributed deep learning has been proposed to train large scale neural network models with huge amounts of datasets using multiple workers. Since workers need to frequently communicate with each other to exchange gradients for updating parameters, communication overhead is always a major challenge in distributed deep learning. To cope with this challenge, gradient compression has been used to reduce the amount of data to be exchanged. However, existing compression methods, including both gradient quantization and gradient sparsification, either hurt model performance significantly or suffer from inefficient compression. In this paper, we propose a novel approach, called Standard Deviation based Adaptive Gradient Compression (SDAGC), which can simultaneously achieve model training with low communication overhead and high model performance in synchronous training. SDAGC uses the standard deviation of gradients in each layer of the neural network to dynamically calculate a suitable threshold according to the training process. Moreover, several associated methods, including residual gradient accumulation, local gradient clipping, adaptive learning rate revision and momentum compensation, are integrated to guarantee the convergence of the model. We verify the performance of SDAGC on various machine learning tasks: image classification, language modeling and speech recognition. The experiment results show that, compared with other existing works, SDAGC can achieve a gradient compression ratio from 433× to 2021× with similar or even better accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.