Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

Muhammed Tawfiqul Islam,Shanika Karunasekera,Rajkumar Buyya

doi:10.1109/tpds.2021.3124670

Abstract

Big data frameworks such as Spark and Hadoop are widely adopted to run analytics jobs in both research and industry. Cloud offers affordable compute resources which are easier to manage. Hence, many organizations are shifting towards a cloud deployment of their big data computing clusters. However, job scheduling is a complex problem in the presence of various Service Level Agreement (SLA) objectives such as monetary cost reduction, and job performance improvement. Most of the existing research does not address multiple objectives together and fail to capture the inherent cluster and workload characteristics. In this article, we formulate the job scheduling problem of a cloud-deployed Spark cluster and propose a novel Reinforcement Learning (RL) model to accommodate the SLA objectives. We develop the RL cluster environment and implement two Deep Reinforce Learning (DRL) based schedulers in TF-Agents framework. The proposed DRL-based scheduling agents work at a fine-grained level to place the executors of jobs while leveraging the pricing model of cloud VM instances. In addition, the DRL-based agents can also learn the inherent characteristics of different types of jobs to find a proper placement to reduce both the total cluster VM usage cost and the average job duration. The results show that the proposed DRL-based algorithms can reduce the VM usage cost up to 30%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jul 1, 2022
Citations: 54

Similar Papers

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
Qilei Zhang ... Qixin Sha
IEEE Access | VOL. 8
Qilei Zhang, et. al.Qilei Zhang ... Qixin Sha
01 Jan 2020
IEEE Access | VOL. 8

Sample effficient deep reinforcement learning for control

-

15 Dec 2019
15 Dec 2019

High-Frequency Quantitative Trading of Digital Currencies Based on Fusion of Deep Reinforcement Learning Models with Evolutionary Strategies
Yijun He ... Bo Xu
Journal of Computing and Information Technology | VOL. 32
Yijun He, et. al.Yijun He ... Bo Xu
15 Jul 2024
Journal of Computing and Information Technology | VOL. 32

Deep Reinforcement Learning
Aske Plaat
-
Aske PlaatAske Plaat
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems