Abstract

Since frequent data communication leads to increased time consumption in distributed training, compression of communication data is one of the effective ways to speed up the whole training process. Many existing compression methods introduce extra time overhead in data compression and decompression, for instance, Top-K and Random-k. There are also methods that can reduce the extra time overhead but suffer from the shortcoming of insufficient generalization. For example, the layer-based compression LR-SGD (Layer-based Random Stochastic Gradient Descent) cannot be applied to the training of recurrent neural networks. In this paper, an improved method based on LR-SGD is proposed. By being divided into virtual layers for compression, this improved method is able to train various networks efficiently. To validate the proposed method, experiments were conducted using different neural networks on a stimulative cluster. The results demonstrate that the proposed method applies well on different networks with low communication overhead and a nice convergence effect.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.