Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability

Dazhao Cheng,Yinggen Xu,Changjun Jiang,Liu Liu,Xiaobo Zhou

doi:10.1109/tpds.2018.2873373

Abstract

As MapReduce is becoming ubiquitous in large-scale data analysis, many recent studies have shown that the performance of MapReduce could be improved by different job scheduling approaches, e.g., Fair Scheduler and Capacity Scheduler. However, most exiting MapReduce job schedulers focus on the scenario that MapReduce cluster is stable and pay little attention to the MapReduce cluster with dynamic resource availability. In fact, MapReduce cluster resources may fluctuate as there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads. Thus, there is a growing need for providing predictable services to users who have strict requirements on job completion times in such dynamic environments. In this paper, we propose, RDS , a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times. We further extend the design of RDS scheduler to support flexible performance goals in various dynamic clusters. In particular, we use flexible deadline time bounds instead of the single fixed job completion deadline. We have implemented RDS in the open-source Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36 and 10 percent compared with Fair Scheduler and Earliest Deadline First (EDF) scheduler, respectively. In a Hadoop cluster running partially on renewable energy, the experimental result shows the green power based resource prediction approach can further reduce the penalty of deadline misses by 16 percent compared to Auto-Regressive Integrated Moving Average (ARIMA) prediction approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Apr 1, 2019
Citations: 57

Similar Papers

Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters
Dazhao Cheng ... Xiaobo Zhou
-
Dazhao Cheng, et. al.Dazhao Cheng ... Xiaobo Zhou
01 May 2015
01 May 2015

Performance modeling in mapreduce environments
Ludmila Cherkasova
-
Ludmila CherkasovaLudmila Cherkasova
14 Mar 2011
14 Mar 2011

Deadline-aware Preemptive Job Scheduling in Hadoop YARN Clusters
Yongqiang Gao ... Kaifeng Zhang
-
Yongqiang Gao, et. al.Yongqiang Gao ... Kaifeng Zhang
04 May 2022
04 May 2022

A constraint programming-based resource allocation and scheduling of map reduce jobs with service level agreement
S Yasmin ... S Jessica Sritha
-
S Yasmin, et. al.S Yasmin ... S Jessica Sritha
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deadline-Aware MapReduce Job Scheduling with Dynamic Resource Availability

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems