A layer-based random SGD for distributed training of RNNs

Zhongyu Zhang

doi:10.1088/1742-6596/2580/1/012002

Abstract

Since frequent data communication leads to increased time consumption in distributed training, compression of communication data is one of the effective ways to speed up the whole training process. Many existing compression methods introduce extra time overhead in data compression and decompression, for instance, Top-K and Random-k. There are also methods that can reduce the extra time overhead but suffer from the shortcoming of insufficient generalization. For example, the layer-based compression LR-SGD (Layer-based Random Stochastic Gradient Descent) cannot be applied to the training of recurrent neural networks. In this paper, an improved method based on LR-SGD is proposed. By being divided into virtual layers for compression, this improved method is able to train various networks efficiently. To validate the proposed method, experiments were conducted using different neural networks on a stimulative cluster. The results demonstrate that the proposed method applies well on different networks with low communication overhead and a nice convergence effect.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A layer-based random SGD for distributed training of RNNs

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Sep 1, 2023
License type: cc-by

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A layer-based random SGD for distributed training of RNNs

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series