Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

Debraj Basu,Deepesh Data,Suhas N Diggavi,Can Karakus

doi:10.1109/jsait.2020.2985917

Abstract

Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of Qsparse-local-SGD . We analyze convergence for Qsparse-local-SGD in the distributed setting for smooth non-convex and convex objective functions. We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use Qsparse-local-SGD to train ResNet-50 on ImageNet and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal on Selected Areas in Information Theory	Publication Date: Apr 10, 2020
Citations: 124	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

Abstract

Talk to us

Similar Papers

More From: IEEE Journal on Selected Areas in Information Theory

Lead the way for us

Similar Papers

Quantization of Distributed Data for Learning
Osama A Hanna ... Suhas Diggavi
IEEE Journal on Selected Areas in Information Theory | VOL. 2
Osama A Hanna, et. al.Osama A Hanna ... Suhas Diggavi
01 Sep 2021
IEEE Journal on Selected Areas in Information Theory | VOL. 2

Distributed stochastic nonsmooth nonconvex optimization
Vyacheslav Kungurtsev
Operations Research Letters | VOL. 50
Vyacheslav KungurtsevVyacheslav Kungurtsev
23 Sep 2022
Operations Research Letters | VOL. 50

Speeding up the convergence of the Polyak’s Heavy Ball algorithm
Koba Gelashvili ... Lela Alkhazishvili
Transactions of A. Razmadze Mathematical Institute | VOL. 172
Koba Gelashvili, et. al.Koba Gelashvili ... Lela Alkhazishvili
12 Apr 2018
Transactions of A. Razmadze Mathematical Institute | VOL. 172

Convex optimization of random dynamic voltage and frequency scaling against power attacks
Weize Yu
Integration | VOL. 82
Weize YuWeize Yu
07 Sep 2021
Integration | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

Abstract

Talk to us

Similar Papers

More From: IEEE Journal on Selected Areas in Information Theory