MapReduce short jobs optimization based on resource reuse

Yuliang Shi,Kaihui Zhang,Lizhen Cui,Lei Liu,Yongqing Zheng,Shidong Zhang,Han Yu

doi:10.1016/j.micpro.2016.05.007

Abstract

Hadoop is an open-source implementation of MapReduce serving for processing large datasets in a massively parallel manner. It was designed aiming at executing large-scale jobs in an enormous number of computing nodes offering computing and storage. However, Hadoop is frequently employed to process short jobs. In practice, short jobs suffer from poor response time and run inefficiently. To fill this gap, this paper analyses the process of job execution and depicts the existing issues why short jobs run inefficiently in Hadoop. According to the characteristic of task execution in multi-wave under cluster overload, we develop a mechanism in light of resource reuse to optimize short jobs execution. This mechanism can reduce the frequency of resource allocation and recovery. Experimental results suggest that the developed mechanism based on resource reuse is able to improve effectiveness of the resource utilization. In addition, the runtime of short jobs can be significantly reduced.

Full Text