SKCompress: compressing sparse and nonuniform gradient in distributed machine learning

Jiawei Jiang,Fangcheng Fu,Yingxia Shao,Tong Yang,Bin Cui

doi:10.1007/s00778-019-00596-3

Abstract

Distributed machine learning (ML) has been extensively studied to meet the explosive growth of training data. A wide range of machine learning models are trained by a family of first-order optimization algorithms, i.e., stochastic gradient descent (SGD). The core operation of SGD is the calculation of gradients. When executing SGD in a distributed environment, the workers need to exchange local gradients through the network. In order to reduce the communication cost, a category of quantification-based compression algorithms are used to transform the gradients to binary format, at the expense of a low precision loss. Although the existing approaches work fine for dense gradients, we find that these methods are ill-suited for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression framework that can efficiently handle sparse and nonuniform gradients? We propose a general compression framework, called SKCompress, to compress both gradient values and gradient keys in sparse gradients. Our first contribution is a sketch-based method that compresses the gradient values. Sketch is a class of algorithm that approximates the distribution of a data stream with a probabilistic data structure. We first use a quantile sketch to generate splits, sort gradient values into buckets, and encode them with the bucket indexes. Our second contribution is a new sketch algorithm, namely MinMaxSketch, which compresses the bucket indexes. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. Since the bucket indexes are nonuniform, we further adopt Huffman coding to compress MinMaxSketch. To compress the keys of sparse gradients, the third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and encode them with binary format. An adaptive prefix is proposed to assign different sizes to different gradient keys, so that we can save more space. We also theoretically discuss the correctness and the error bound of our proposed methods. To the best of our knowledge, this is the first effort utilizing data sketch to compress gradients in ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc. and show that our method is up to $$12\times $$ faster than the existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SKCompress: compressing sparse and nonuniform gradient in distributed machine learning

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: Jan 1, 2020
Citations: 15

Similar Papers

SketchML
Jiawei Jiang ... Fangcheng Fu
-
Jiawei Jiang, et. al.Jiawei Jiang ... Fangcheng Fu
27 May 2018
27 May 2018

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Keshi Ge ... Yongquan Fu
-
Keshi Ge, et. al.Keshi Ge ... Yongquan Fu
23 May 2022
23 May 2022

Model averaging in distributed machine learning: a case study with Apache Spark
Yunyan Guo ... Zhipeng Zhang
The VLDB Journal | VOL. 30
Yunyan Guo, et. al.Yunyan Guo ... Zhipeng Zhang
15 Apr 2021
The VLDB Journal | VOL. 30

Convergence analysis of distributed stochastic gradient descent with shuffling
Qi Meng ... Tie-Yan Liu
Neurocomputing | VOL. 337
Qi Meng, et. al.Qi Meng ... Tie-Yan Liu
22 Jan 2019
Neurocomputing | VOL. 337

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SKCompress: compressing sparse and nonuniform gradient in distributed machine learning

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal