Improving MapReduce performance through data placement in heterogeneous Hadoop clusters

Jiong Xie Jiong Xie,Yun Tian Yun Tian,Shu Yin Shu Yin,Adam Manzanares,Zhiyang Ding Zhiyang Ding,Xiaojun Ruan Xiaojun Ruan,James Majors,Xiao Qin Xiao Qin

doi:10.1109/ipdpsw.2010.5470880

Abstract

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop-an open-source implementation of MapReduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Data locality has not been taken into account for launching speculative map tasks, because it is assumed that most maps are data-local. Unfortunately, both the homogeneity and data locality assumptions are not satisfied in virtualized data centers. We show that ignoring the data-locality issue in heterogeneous environments can noticeably reduce the MapReduce performance. In this paper, we address the problem of how to place data across nodes in a way that each node has a balanced data processing load. Given a dataintensive application running on a Hadoop MapReduce cluster, our data placement scheme adaptively balances the amount of data stored in each node to achieve improved data-processing performance. Experimental results on two real data-intensive applications show that our data placement strategy can always improve the MapReduce performance by rebalancing data across nodes before performing a data-intensive application in a heterogeneous Hadoop cluster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving MapReduce performance through data placement in heterogeneous Hadoop clusters

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments
Chia-Wei Lee ... Hung-Chang Hsiao
Big Data Research | VOL. 1
Chia-Wei Lee, et. al.Chia-Wei Lee ... Hung-Chang Hsiao
23 Jul 2014
Big Data Research | VOL. 1

On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters
Yang Liu ... Meng Wang
-
Yang Liu, et. al.Yang Liu ... Meng Wang
01 Jun 2018
01 Jun 2018

Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters
Rajashekhar M Arasanal ... Daanish U Rumani
-
Rajashekhar M Arasanal, et. al.Rajashekhar M Arasanal ... Daanish U Rumani
01 Jan 2013
01 Jan 2013

Improving MapReduce performance in heterogeneous environments
...
-
, et. al. ...
08 Dec 2008
08 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving MapReduce performance through data placement in heterogeneous Hadoop clusters

Abstract

Talk to us

Similar Papers