Reinforcement Learning for Datacenter Congestion Control

Chen Tessler,Shie Mannor,Benjamin Fuhrer,Gal Chechik,Amit Mandelbaum,Gal Dalal,Yuval Shpigelman,Doron Haritan Kazakov

doi:10.1609/aaai.v36i11.21535

Abstract

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, non-stationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that these challenges prevent standard RL algorithms from operating within this domain. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, show that our method exhibits improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Datacenter Congestion Control

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 1

Similar Papers

Reinforcement Learning for Datacenter Congestion Control
Chen Tessler ... Shie Mannor
ACM SIGMETRICS Performance Evaluation Review | VOL. 49
Chen Tessler, et. al.Chen Tessler ... Shie Mannor
17 Jan 2022
ACM SIGMETRICS Performance Evaluation Review | VOL. 49

Even Lower Latency, Even Better Fairness: Logistic Growth Congestion Control in Datacenters
Peyman Teymoori ... David Hayes
-
Peyman Teymoori, et. al.Peyman Teymoori ... David Hayes
01 Nov 2016
01 Nov 2016

APCC: Agile and Precise Congestion Control in Datacenters
Renjie Zhou ... Shan Huang
-
Renjie Zhou, et. al.Renjie Zhou ... Shan Huang
01 Dec 2020
01 Dec 2020

A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers
Shiva Ketabi ... Hongkai Chen
-
Shiva Ketabi, et. al.Shiva Ketabi ... Hongkai Chen
08 May 2023
08 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Datacenter Congestion Control

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence