Abstract

Distributed training is needed to shorten the training time of deep neural networks. However, the communication overhead often hurts performance efficiency, especially in a distributed computing environment with limited network bandwidth. Hence, gradient compression techniques have been proposed to reduce communication time. But, compression also has the risk of causing lower model accuracy and longer training time due to compression loss and compression time. As a result, compression may not consistently achieve desired results, and there are limited discussions on when and which compression should be used. To address this problem, we propose a performance-driven hybrid compression solution. We make three main contributions. (1) We describe a hybrid compression strategy that chooses the compression method for individual model gradients. (2) We build an offline performance estimator and an online loss monitor to ensure the compression decision can minimize training time without sacrificing mode accuracy. (3) Our implementation can be imported to existing deep learning frameworks and applicable to a wide range of compression methods. Up to 3.6x training performance speedup was observed compared to other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call