Abstract

Since frequent data communication leads to increased time consumption in distributed training, compression of communication data is one of the effective ways to speed up the whole training process. Many existing compression methods introduce extra time overhead in data compression and decompression, for instance, Top-K and Random-k. There are also methods that can reduce the extra time overhead but suffer from the shortcoming of insufficient generalization. For example, the layer-based compression LR-SGD (Layer-based Random Stochastic Gradient Descent) cannot be applied to the training of recurrent neural networks. In this paper, an improved method based on LR-SGD is proposed. By being divided into virtual layers for compression, this improved method is able to train various networks efficiently. To validate the proposed method, experiments were conducted using different neural networks on a stimulative cluster. The results demonstrate that the proposed method applies well on different networks with low communication overhead and a nice convergence effect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call