Canary: Congestion-aware in-network allreduce using dynamic trees

Daniele De Sensi,Edgar Costa Molero,Salvatore Di Girolamo,Laurent Vanbever,Torsten Hoefler

doi:10.1016/j.future.2023.10.010

Daniele De Sensi, Edgar Costa Molero + Show 3 more

Open Access

https://doi.org/10.1016/j.future.2023.10.010

Copy DOI

Abstract

The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 Canary prototype and evaluate it on a Tofino switch. We then validate Canary through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Canary: Congestion-aware in-network allreduce using dynamic trees

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Journal: Future Generation Computer Systems	Publication Date: Oct 29, 2023
License type: cc-by

Similar Papers

Automating Ground Truth Annotations for Gland Segmentation Through Immunohistochemistry
Tushar Kataria ... Shireen Y Elhabian
Modern Pathology | VOL. 36
Tushar Kataria, et. al.Tushar Kataria ... Shireen Y Elhabian
15 Sep 2023
Modern Pathology | VOL. 36

XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory
Jueon Park ... Hyojin Sung
IEEE Computer Architecture Letters | VOL. 22
Jueon Park, et. al.Jueon Park ... Hyojin Sung
01 Jan 2023
IEEE Computer Architecture Letters | VOL. 22

Distributed Framework for Accelerating Training of Deep Learning Models through Prioritization
Tian Zhou ... Lixin Gao
-
Tian Zhou, et. al.Tian Zhou ... Lixin Gao
01 Oct 2021
01 Oct 2021

ZipLine: An Optimized Algorithm for the Elastic Bulk Synchronous Parallel Model
Xing Zhao ... Bao Xin Chen
-
Xing Zhao, et. al.Xing Zhao ... Bao Xin Chen
06 Oct 2021
06 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Canary: Congestion-aware in-network allreduce using dynamic trees

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems