FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks

Linnan Wang,Junyu Zhang,Hang Liu,Rodrigo Fonseca,George Bosilca,Maurice Herlihy,Wei Wu

doi:10.1145/3369583.3392681

Abstract

The performance and efficiency of distributed training of Deep Neural Networks (DNN) highly depend on the performance of gradient averaging among participating processes, a step bound by communication costs. There are two major approaches to reduce communication overhead: overlap communications with computations (lossless), or reduce communications (lossy). The lossless solution works well for linear neural architectures, e.g. VGG, AlexNet, but more recent networks such as ResNet and Inception limit the opportunity for such overlapping. Therefore, approaches that reduce the amount of data (lossy) become more suitable. In this paper, we present a novel, explainable lossy method that sparsifies gradients in the frequency domain, in addition to a new range-based float point representation to quantize and further compress gradients. These dynamic techniques strike a balance between compression ratio, accuracy, and computational overhead, and are optimized to maximize performance in heterogeneous environments. Unlike existing works that strive for a higher compression ratio, we stress the robustness of our methods, and provide guidance to recover accuracy from failures. To achieve this, we prove how the FFT sparsification affects the convergence and accuracy, and show that our method is guaranteed to converge using a diminishing θ in training. Reducing θ can also be used to recover accuracy from the failure. Compared to STOA lossy methods, e.g., QSGD, TernGrad, and Top-k sparsification, our approach incurs less approximation error, thereby better in both the wall-time and accuracy. On an 8 GPUs, InfiniBand interconnected cluster, our techniques effectively accelerate AlexNet training up to 2.26x to the baseline of no compression, and 1.31x to QSGD, 1.25x to Terngrad and 1.47x to Top-K sparsification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
Bontak Gu ... Arslan Munir
-
Bontak Gu, et. al.Bontak Gu ... Arslan Munir
01 Dec 2019
01 Dec 2019

A Guessing Entropy-Based Framework for Deep Learning-Assisted Side-Channel Analysis
Ziyue Zhang ... Yunsi Fei
IEEE Transactions on Information Forensics and Security | VOL. 18
Ziyue Zhang, et. al.Ziyue Zhang ... Yunsi Fei
01 Jan 2023
IEEE Transactions on Information Forensics and Security | VOL. 18

PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters
Jinghui Zhang ... Zhiang Wu
Neurocomputing | VOL. 555
Jinghui Zhang, et. al.Jinghui Zhang ... Zhiang Wu
04 Aug 2023
Neurocomputing | VOL. 555

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks

Abstract

Talk to us

Similar Papers