Performance Analysis of the Effect of a Combiner on a MapReduce Job

Imran Artwel J Mhlanga,Siti Fatimah Abdul Razak,Afizan Azman,Nazrul M Ahmad

doi:10.1109/scored.2018.8711046

Abstract

MapReduce has been widely deployed as the most efficient framework for big data processing due to its ability to run on commodity hardware as well as the ability to automatically and effectively manage parallel execution of tasks. During the shuffle phase, a lot of data traffic is generated which consumes a lot of bandwidth and in turn, leads to performance degradation. Many efforts have been made to reduce the data traffic during the shuffle phase, with the common one being the use of a combiner function which is default in the Hadoop framework. This paper presents a performance analysis of the effect of a combiner function on the reduce times and reduce shuffle bytes while varying the number of reduce tasks. The results of the analysis show that the combiner significantly reduces the reduce times as well as the reduce shuffle bytes.

Full Text