Abstract

MapReduce provides a data-parallel computing framework, and has emerged as a popular processing model due to the simplicity of operations for big data application developers. Data processing applications from many different domains such as search and data mining are usually developed using open-source Hadoop implementation of MapReduce or self-developed MapReduce-like implementations like Dryad [1] and Ciel [2]. In cloud environments, products like Amazon's Elastic Compute Cloud (EC2) [3] provide MapReduce services as third-party multi-tenant service. Even within a company, a number of products may share the MapReduce cluster. Therefore, a fair and efficient scheduler is crucial to improve performance of submitted jobs and guarantee multi-user fairness. However, in practice, it is hard to guarantee both fairness and per-job performance, especially when jobs are scheduled without accurate estimation. We show that processor sharing (PS) type of schedulers like Fair Scheduling degrade the per-job performance in a multi-user environment. We present a new scheduling policy, Hybrid Parallel pessimistic Fair Schedule Protocol (H-PFSP), that can finish every job no later than Fair scheduler does. Unlike Fair scheduler, however, it can improve the per-job performance of MapReduce systems with relatively accurate job progress estimation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call