LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning.

Jingjing Zhang,Osvaldo Simeone

doi:10.1109/tnnls.2020.2979762

Jingjing Zhang, Osvaldo Simeone

Open Access

https://doi.org/10.1109/tnnls.2020.2979762

Copy DOI

Abstract

Gradient-based distributed learning in parameter server (PS) computing architectures is subject to random delays due to straggling worker nodes and to possible communication bottlenecks between PS and workers. Solutions have been recently proposed to separately address these impairments based on the ideas of gradient coding (GC), worker grouping, and adaptive worker selection. This article provides a unified analysis of these techniques in terms of wall-clock time, communication, and computation complexity measures. Furthermore, in order to combine the benefits of GC and grouping in terms of robustness to stragglers with the communication and computation load gains of adaptive selection, novel strategies, named lazily aggregated GC (LAGC) and grouped-LAG (G-LAG), are introduced. Analysis and results show that G-LAG provides the best wall-clock time and communication performance while maintaining a low computational cost, for two representative distributions of the computing times of the worker nodes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Apr 6, 2020
Citations: 46	License type: other-oa

R Discovery Prime

R Discovery Prime

LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Similar Papers

An Adaptive Distributed Source Coding Design for Distributed Learning
Naifu Zhang ... Meixia Tao
-
Naifu Zhang, et. al.Naifu Zhang ... Meixia Tao
20 Oct 2021
20 Oct 2021

A Low-Complexity and Adaptive Distributed Source Coding Design for Model Aggregation in Distributed Learning
Naifu Zhang ... Meixia Tao
IEEE Open Journal of the Communications Society | VOL. 3
Naifu Zhang, et. al.Naifu Zhang ... Meixia Tao
01 Jan 2021
IEEE Open Journal of the Communications Society | VOL. 3

Approximate Gradient Coding for Heterogeneous Nodes
Amogh Johri ... Tejas Bodas
-
Amogh Johri, et. al.Amogh Johri ... Tejas Bodas
17 Oct 2021
17 Oct 2021

Joint Dynamic Grouping and Gradient Coding for Time-Critical Distributed Machine Learning in Heterogeneous Edge Networks
Yingchi Mao ... Jie Wu
IEEE Internet of Things Journal | VOL. 9
Yingchi Mao, et. al.Yingchi Mao ... Jie Wu
15 Nov 2022
IEEE Internet of Things Journal | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems