Optimized scheduling of multi‐user Map‐Reduce jobs in heterogeneous environment

Perumal Varalakshmi,Sankari Subbiah

doi:10.1002/cpe.7316

Abstract

SummaryMap‐Reduce is a programming paradigm widely used for data intensive applications in distributed computing. Divisible load theory (DLT) is another model to partition divisible load for parallel processing. In this work, a new job scheduler named virtual job scheduler (VJS) is developed to schedule the MapReduce jobs based on DLT. VJS constructs a virtual job set from the queue of jobs awaiting execution by considering the CPU and IO resource utilization levels of each job. The core of VJS is in its partitioning algorithm. Two novel partitioning algorithms, namely, two level successive partitioning (TLSP) and predictive partitioning (PRED) have been proposed. TLSP applies DLT to both map and reduce phase in succession. This second load partitioning performed during the comparatively longer reduce phase, exploits the advantage of DLT better. PRED is a modification of TLSP aimed at optimizing the overall schedule length, by taking the idle time of the reducers into consideration. Evaluation of these two models across various execution environments has indicated a profound decrease in the wait time of reducers and as a result has shown a significant reduction in the makespan of the whole job as such. VJS altered to incorporate PRED partitioning algorithm has proved to be suitable for heterogeneous environment, with high resource utilization compared with the default Hadoop MapReduce scheduler.

Full Text