Energy Cost Aware Scheduling of MapReduce Jobs across Geographically Distributed Nodes

Tan N Le,Pradipta De,Bong Jun Choi

doi:10.1145/2847220.2847233

Abstract

MapReduce framework is designed to distribute computations among a large set of nodes. MapReduce implementation is typically designed to operate on nodes within a single cluster or data center, like Amazon’s Elastic MapReduce. However, there are benefits if one can choose a set of cloud providers, and use geographically distributed private and public clouds to execute a MapReduce job in a geo-distributed environment. In this poster, we present a technique to choose geographically distributed nodes for executing a MapReduce job with the objective of minimizing the total energy cost of completing the job while satisfying Quality of Service (QoS). We consider the MapReduce system in a geographically distributed environment that consists of Nd data nodes, Nm mapper nodes, and Nr reducer nodes. Each data node Di (1 ≤ i ≤ Nd) has an amount of input data di. Each data node is connected to every mapper node Mj (1 ≤ j ≤ Nd). Each mapper node j is connected to each reducer node Rk (1 ≤ k ≤ Nr). The compute rate of each mapper node Mj and each reducer node Rk are different. The electricity prices vary according to the location of MapReduce nodes [1]. The push, map, shuffle, and reduce phases are executed sequentially to complete a MapReduce job. We assume that there is a global barrier between the phases, which requires all nodes in one phase to complete execution before the execution at any node in the next phase can proceed. In our design, when a MapReduce job is submitted, we schedule all the necessary MapReduce nodes assuming that the regional electricity prices, the compute rates of MapReduce nodes, the bandwidth of links are known. The MapReduce user specifies the deadline constraint T that satisfies

Full Text