RAPID: Enabling fast online policy learning in dynamic public cloud environments

Drew Penney,Bin Li,Lizhong Chen,Jaroslaw J Sydir,Anna Drewek-Ossowicka,Ramesh Illikkal,Charlie Tai,Ravi Iyer,Andrew Herdrich

doi:10.1016/j.neucom.2023.126737

Abstract

Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19%–43%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RAPID: Enabling fast online policy learning in dynamic public cloud environments

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

SLM-as-a-Service - a conceptual framework
Nicola Sfondrini ... Gianmario Motta
-
Nicola Sfondrini, et. al.Nicola Sfondrini ... Gianmario Motta
01 Apr 2017
01 Apr 2017

QoS estimation and selection of CSP in oligopoly environment for Internet of Things
Subarna Chatterjee ... Sudip Misra
-
Subarna Chatterjee, et. al.Subarna Chatterjee ... Sudip Misra
01 Apr 2016
01 Apr 2016

Distance impact on quality of video streaming services in cloud environments
Amirah Alomari
International Journal of Space-Based and Situated Computing | VOL. 7
Amirah AlomariAmirah Alomari
01 Jan 2017
International Journal of Space-Based and Situated Computing | VOL. 7

Cloud Resources Forecasting based on Server Workload using ML Techniques
Tejaswini Sambrajyam Janjanam ... Praveen Kumar Kollu
-
Tejaswini Sambrajyam Janjanam, et. al.Tejaswini Sambrajyam Janjanam ... Praveen Kumar Kollu
05 Jan 2023
05 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RAPID: Enabling fast online policy learning in dynamic public cloud environments

Abstract

Talk to us

Similar Papers

More From: Neurocomputing