A major bottleneck in distributed learning is the communication overhead of exchanging intermediate model update parameters between the worker nodes and the parameter server. Recently, it is found that local gradients among different worker nodes are correlated. Therefore, distributed source coding (DSC) can be applied to increase communication efficiency by exploiting such correlation. However, it is highly non-trivial to exploite the gradient correlations in distributed learning due to the unknown and time-varying gradient correlation. In this paper, we first propose a DSC framework, named successive Wyner-Ziv coding, for distributed learning based on quantization and Slepian-Wolf (SW) coding. We prove that the proposed framework can achieve the theoretically minimum communication cost from an information theory perspective. We also propose a low-complexity and adaptive DSC for distributed learning, including a gradient statistics estimator, rate controller, and a log-likelihood ratio (LLR) computer. The gradient statistics estimator estimates the gradient statistics online based only on the quantized gradients at previous iterations, hence it does not introduce extra communication cost. The computation complexity of the rate controller and the LLR computer is reduced to a linear growth in the number of worker nodes by introducing a semi-analytical Monte Carlo simulation. Finally, we design a DSC-based distributed learning process and find that the extra delay introduced by DSC does not scale with the number of worker nodes.
Read full abstract