Abstract

In this paper, we study the dependency between MapReduce configuration parameters and network load of fixed-size MapReduce jobs during the shuffle phase, then we propose an analytical method to model this dependency. Our approach consists of three key phases: profiling, modeling, and prediction. In the first stage, an application is run several times with different sets of MapReduce configuration parameters (here number of map tasks and number of reduce tasks) to profile the network load of an application in the shuffle phase on a given cluster. Then, the relation between these parameters and the network load is modeled by multivariate linear regression. For evaluation, three applications (Word Count, Exim Main log parsing, and TeraSort) are utilized to evaluate our technique on a 5-node MapReduce private cluster.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call