Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

Shilin Wen,Chi Harold Liu,Rui Han,Lydia Y Chen

doi:10.1186/s13677-023-00465-z

Shilin Wen, Chi Harold Liu + Show 2 more

Open Access

https://doi.org/10.1186/s13677-023-00465-z

Copy DOI

Abstract

Edge-cloud applications are rapidly prevailing in recent years and pose the challenge of using both resource-strenuous edge devices and elastic cloud resources under dynamic workloads. Efficient resource allocation on edge-cloud jobs via cluster schedulers (e.g. Kubernetes/Volcano scheduler) is essential to guarantee their performance, e.g. tail latency, and such allocation is sensitive to scheduler configurations such as applied scheduling algorithms and task restart/discard policy. Deep reinforcement learning (DRL) is increasingly applied to optimize scheduling decisions. However, DRL faces the conundrum of achieving high rewards at a dauntingly long training time (e.g. hours or days), making it difficult to tune the scheduler configurations online in accordance to dynamically changing edge-cloud workloads and resources. For such an issue, this paper proposes EdgeTuner, a fast scheduler configuration tuning approach that efficiently leverages DRL to reduce tail latency of edge-cloud jobs. The enabling feature of EdgeTuner is to effectively simulate the execution of edge-cloud jobs under different scheduler configurations and thus quickly estimate these configurations’ influence on job performance. The simulation results allow EdgeTuner to timely train a DRL agent in order to properly tune scheduler configurations in dynamic edge-cloud environment. We implement EdgeTuner in both Kubernetes and Volcano schedulers and extensively evaluate it on real workloads driven by Alibaba production traces. Our results show that EdgeTuner outperforms prevailing scheduling algorithms by achieving much lower tail latency while accelerating DRL training speed by an average of 151.63x.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cloud Computing	Publication Date: Jun 17, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

Abstract

Talk to us

Similar Papers

More From: Journal of Cloud Computing

Lead the way for us

Similar Papers

EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources
Rui Han ... Chi Harold Liu
-
Rui Han, et. al.Rui Han ... Chi Harold Liu
02 May 2022
02 May 2022

The Fast and The Frugal: Tail Latency Aware Provisioning for Coping with Load Variations
Adithya Kumar ... Timothy Zhu
-
Adithya Kumar, et. al.Adithya Kumar ... Timothy Zhu
20 Apr 2020
20 Apr 2020

Few-to-Many
Md E. Haque ... Ricardo Bianchini
ACM SIGPLAN Notices | VOL. 50
Md E. Haque, et. al.Md E. Haque ... Ricardo Bianchini
14 Mar 2015
ACM SIGPLAN Notices | VOL. 50

Few-to-Many
Md E Haque ... Ricardo Bianchini
ACM SIGARCH Computer Architecture News | VOL. 43
Md E Haque, et. al.Md E Haque ... Ricardo Bianchini
14 Mar 2015
ACM SIGARCH Computer Architecture News | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

Abstract

Talk to us

Similar Papers

More From: Journal of Cloud Computing