Abstract

MapReduce is an important distributed programming model for large-scale data-parallel applications like web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce and it is often applied to short jobs for which low response time is critical. When the cluster nodes are homogeneous, Hadoop has a good performance. In practice, the homogeneity assumptions do not always hold. In heterogeneous environment, there are various devices which vary greatly in the capacities of computation, communication, architectures, memories and power. When different nodes process the same amount of data, load balancing problem occurs. In this paper we address the problem of how to assign data after Map phase to balance the execution time of each Reduce task by proposing a novel load balancing algorithm based on nodes performance (LBNP), in which the input data of poor performance nodes are decreased. Simulation results indicate that all the Reduce tasks can be completed in the same time which shortens the whole Reduce phase. Thus the efficiency of MapReduce is improved Keywords—MapReduce; Hadoop; Load balance; Heterogeneous environment; Nodes performance;

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call