Differentially Quantized Gradient Descent

Chung-Yi Lin,Babak Hassibi,Victoria Kostina

doi:10.1109/isit45174.2021.9518254

Abstract

Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The only information that the server knows about the problem instance is what it receives from the worker via a rate-limited noiseless communication channel. We introduce the technique we call differential quantization (DQ) that compensates past quantization errors to make the descent trajectory of a quantized algorithm follow that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that differentially quantized gradient descent (DQ-GD) attains a linear convergence rate of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\max\{\sigma_{\text{GD}}, \rho_{n}2^{-R}\}$</tex> , where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\sigma_{\text{GD}}$</tex> is the convergence rate of unquantized gradient descent (GD), <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\rho_{n}$</tex> is the covering efficiency of the quantizer, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$R$</tex> is the bitrate per problem dimension <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$n$</tex> . Thus at any <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$R\geq\log_{2}\rho_{n}/\sigma_{\text{GD}}$</tex> , the convergence rate of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show a converse demonstrating that no GD-like quantized algorithm can converge faster than <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\max\{\sigma_{\text{GD}}, 2^{-R}\}$</tex> . Since quantizers exist with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\rho_{n}\rightarrow 1$</tex> as <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$n\rightarrow\infty$</tex> (Rogers, 1963), this means that DQ-GD is asymptotically optimal. In contrast, naively quantized GD where the worker directly quantizes the gradient attains only <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\sigma_{\text{GD}}+\rho_{n}2^{-R}$</tex> . The technique of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in convergence rate obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on both simulated and realworld least-squares problems validate our theoretical analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Differentially Quantized Gradient Descent

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus
Navid Azizan-Ruhi ... Farshad Lahouti
IEEE Transactions on Signal Processing | VOL. 67
Navid Azizan-Ruhi, et. al.Navid Azizan-Ruhi ... Farshad Lahouti
15 Jul 2019
IEEE Transactions on Signal Processing | VOL. 67

Uniting Nesterov's Accelerated Gradient Descent and the Heavy Ball Method for Strongly Convex Functions with Exponential Convergence Rate
Dawn M Hustig-Schultz ... Ricardo G Sanfelice
-
Dawn M Hustig-Schultz, et. al.Dawn M Hustig-Schultz ... Ricardo G Sanfelice
25 May 2021
25 May 2021

Distributed Solution of Large-Scale Linear Systems Via Accelerated Projection-Based Consensus
Navid Azizan-Ruhi ... Farshad Lahouti
-
Navid Azizan-Ruhi, et. al.Navid Azizan-Ruhi ... Farshad Lahouti
01 Apr 2018
01 Apr 2018

On the Unified Design of Accelerated Gradient Descent
Yuquan Chen ... Yiheng Wei
-
Yuquan Chen, et. al.Yuquan Chen ... Yiheng Wei
18 Aug 2019
18 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Differentially Quantized Gradient Descent

Abstract

Talk to us

Similar Papers