Adaptive sparse ternary gradient compression for distributed DNN training in edge computing

Yingchi Mao,Xuesong Xu,Jun Wu,Longbao Wang

doi:10.1007/s42514-022-00091-2

Abstract

In edge computing, though distributed training of Deep Neural Networks (DNNs) is expected to exchange massive gradients between parameter servers and working nodes, the high communication cost constrains the training speed. To break this limitation, gradient compression algorithms expect the ultimate compression ratio at the expense of the accuracy of the trained model. Therefore, new gradient compression techniques are necessary to ensure both communication efficiency and model accuracy. This paper introduces a novel technique—an Adaptive Sparse Ternary Gradient Compression (ASTC) scheme, which relies on the number of gradients in model layers to compress gradients. ASTC establishes the model compression selection criterion by gradients’ amount, compresses the network layer that meets the model’s standard, evaluates the gradients’ importance based on entropy to adaptively perform sparse compression, and finally conducts ternary quantization compression and a lossless code scheme on sparse gradients. Using public datasets (MNIST, CIFAR-10, Tiny ImageNet) and deep learning models (CNN, LeNet5, ResNet18) for experimental evaluation, we exhibit excellent results that the training efficiency of ASTC is about 1.6 times, 1.37 times, and 1.1 times higher than that of Top-1, AdaComp, and SBC, respectively. Furthermore, ASTC can be improved by an average of about 1.9% in training accuracy compared with the above approaches.

Full Text