Reinforcement Learning for Datacenter Congestion Control

Chen Tessler,Gal Dalal,Yuval Shpigelman,Doron Haritan Kazakov,Gal Chechik,Benjamin Fuhrer,Amit Mandelbaum,Shie Mannor

doi:10.1145/3512798.3512815

Abstract

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, nonstationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Datacenter Congestion Control

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review

Lead the way for us

Journal: ACM SIGMETRICS Performance Evaluation Review	Publication Date: Jan 17, 2022
Citations: 12

Similar Papers

Reinforcement Learning for Datacenter Congestion Control
Chen Tessler ... Doron Haritan Kazakov
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Chen Tessler, et. al.Chen Tessler ... Doron Haritan Kazakov
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Even Lower Latency, Even Better Fairness: Logistic Growth Congestion Control in Datacenters
Peyman Teymoori ... David Hayes
-
Peyman Teymoori, et. al.Peyman Teymoori ... David Hayes
01 Nov 2016
01 Nov 2016

APCC: Agile and Precise Congestion Control in Datacenters
Renjie Zhou ... Shan Huang
-
Renjie Zhou, et. al.Renjie Zhou ... Shan Huang
01 Dec 2020
01 Dec 2020

A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers
Shiva Ketabi ... Hongkai Chen
-
Shiva Ketabi, et. al.Shiva Ketabi ... Hongkai Chen
08 May 2023
08 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning for Datacenter Congestion Control

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review