End-to-End Optimization for Geo-Distributed MapReduce

Benjamin Heintz,Ramesh K Sitaraman,Abhishek Chandra,Jon Weissman

doi:10.1109/tcc.2014.2355225

Benjamin Heintz, Ramesh K Sitaraman + Show 2 more

Open Access

https://doi.org/10.1109/tcc.2014.2355225

Copy DOI

Abstract

MapReduce has proven remarkably effective for a wide variety of data-intensive applications, but it was designed to run on large single-site homogeneous clusters. Researchers have begun to explore the extent to which the original MapReduce assumptions can be relaxed, including skewed workloads, iterative applications, and heterogeneous computing environments. This paper continues this exploration by applying MapReduce across geo-distributed data over geo-distributed computation resources. Using Hadoop, we show that network and node heterogeneity and the lack of data locality lead to poor performance, because the interaction of MapReduce phases becomes pronounced in the presence of heterogeneous network behavior. To address these problems, we take a two-pronged approach: We first develop a model-driven optimization that serves as an oracle, providing high-level insights. We then apply these insights to design cross-phase optimization techniques that we implement and demonstrate in a real-world MapReduce implementation. Experimental results in both Amazon EC2 and PlanetLab show the potential of these techniques as performance is improved by 7-18 percent depending on the execution environment and application.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Cloud Computing	Publication Date: Jul 1, 2016
Citations: 48	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

End-to-End Optimization for Geo-Distributed MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Similar Papers

Cross-Phase Optimization in MapReduce
B Heintz ... J Weissman
-
B Heintz, et. al.B Heintz ... J Weissman
01 Mar 2013
01 Mar 2013

Cross-Phase Optimization in MapReduce
Benjamin Heintz ... Abhishek Chandra
-
Benjamin Heintz, et. al.Benjamin Heintz ... Abhishek Chandra
01 Jan 2014
01 Jan 2014

Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness
Li Chen ... Bo Li
IEEE Transactions on Network Science and Engineering | VOL. 6
Li Chen, et. al.Li Chen ... Bo Li
01 Jul 2019
IEEE Transactions on Network Science and Engineering | VOL. 6

Scheduling jobs across geo-distributed datacenters with max-min fairness
Li Chen ... Bo Li
-
Li Chen, et. al.Li Chen ... Bo Li
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-End Optimization for Geo-Distributed MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing