Abstract

Edge-cloud jobs are rapidly prevailing in many application domains, posing the challenge of using both resource-strenuous edge devices and elastic cloud resources. Efficient resource allocation on such jobs via scheduling algorithms is essential to guarantee their performance, e.g. latency. Deep reinforcement learning (DRL) is increasingly adopted to make scheduling decisions but faces the conundrum of achieving high rewards at a low training overhead. It is unknown if such a DRL can be applied to timely tune the scheduling algorithms that are adopted in response to fast changing workloads and resources. In this paper, we propose EdgeTuner to effectively leverage DRL to select scheduling algorithms online for edge-cloud jobs. The enabling features of EdgeTuner are sophisticated DRL model that captures complex dynamics of Edge-Cloud jobs/tasks and an effective simulator to emulate the response times of short-running jobs in accordance to dynamically changing scheduling algorithms. EdgeTuner trains DRL agents offline by directly interacting with the simulator. We implement EdgeTuner on Kubernetes scheduler and extensively evaluate it on Kubernetes cluster testbed driven by the production traces. Our results show that EdgeTuner outperforms prevailing scheduling algorithms by achieving significant lower job response time while accelerating DRL training speed by more than 180x.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call