HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

Wenhong Tian,Wutong Yang,Guozhong Li,Rajkumar Buyya

doi:10.1007/s11227-016-1737-4

Abstract

Large-scale MapReduce clusters that routinely process big data bring challenges to the cloud computing. One of the key challenges is to reduce the response time of these MapReduce clusters by minimizing their makespans. It is observed that the order in which these jobs are executed can have a significant impact on their overall makespans and resource utilization. In this work, we consider a scheduling model for multiple MapReduce jobs. The goal is to design a job scheduler that minimizes the makespan of such a set of MapReduce jobs. We exploit classical Johnson model and propose a novel framework HScheduler, which combines features of both classical Johnson's algorithm and MapReduce to minimize the makespan for both offline and online jobs. Our Offline HScheduler reaches the theoretical lower bound (optimum) and Online HScheduler is 2-competitive which is the best-known constant ratio for minimizing the makespan. Through extensive real data tests, we find that HScheduler has better performance than the best-known approach by 10.6---11.7 % on average for offline scheduling and 8---10 % on average for online scheduling. The HScheduler can be applied to improve responsive time, throughput and energy efficiency in cloud computing.

Full Text