Abstract

MapReduce is an important programming model for large-scale data-intensive applications such as web indexing, scientific simulation, and data mining. Hadoop is an open-source implementation of MapReduce enjoying wide adoption. Partition function is an important component of Hadoop which split outputs of maps into bulks that place the input data of reduces. Based on the assumptions that cluster nodes are homogeneous and perform work at roughly the same rate, its default partition function splits intermediate keys into reduces. However, in practice the homogeneity assumptions seldom hold and cluster nodes usually perform work at different rate. In this paper, we design a heterogeneity-load-aware partition function named proportional partition function (PPF). Besides the dynamic loading of cluster nodes, PPF considers the capacity diversity of cluster nodes such as CPU processing speed and disk writing speed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call