Virtual Shuffling for Efficient Data Movement in MapReduce

Weikuan Yu,Cong Xu,Xinyu Que,Yandong Wang

doi:10.1109/tc.2013.216

Abstract

MapReduce is a popular parallel processing framework for large-scale data analytics. To keep up with the increasing volume of datasets, it requires efficient I/O capability from the underlying computer systems to process and analyze data in two phases (mapping and reducing). Between these phases, MapReduce requires a shuffling phase to globally exchange the intermediate data generated by the mapping phase. We reveal that data shuffling, by physically moving segments of intermediate data across disks, causes significant I/O contention and compounds the I/O problem. In this paper, we propose a novel virtual shuffling strategy to enable efficient data movement and reduce I/O for MapReduce shuffling, thereby reducing power consumption and conserving energy. Virtual shuffling is realized through a combination of three techniques including a three-level segment table, near-demand merging, and dynamic and balanced merging subtrees. Our experimental results show that virtual shuffling significantly speeds up data movement in MapReduce and achieves faster job execution. Particularly, its reduction in disk I/O accesses results in as much as 12% savings in power consumption for MapReduce programs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Virtual Shuffling for Efficient Data Movement in MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Feb 1, 2015
Citations: 22

Similar Papers

Clock-Gating in FPGAs: A Novel and Comparative Evaluation
Yan Zhang ... J Roivainen
-
Yan Zhang, et. al. Yan Zhang ... J Roivainen
01 Jan 2006
01 Jan 2006

Power consumption and throughput in mobile ad hoc networks using directional antennas
A Nasipuri ... U.R Sappidi
-
A Nasipuri, et. al.A Nasipuri ... U.R Sappidi
10 Dec 2002
10 Dec 2002

A Novel Low Power Ternary Multiplier Design using CNFETs
Harita Sirugudi ... Sharvani Gadgil
-
Harita Sirugudi, et. al.Harita Sirugudi ... Sharvani Gadgil
01 Jan 2020
01 Jan 2020

Self-Adaptive Run-Time Variable Floating-Point Precision for Iterative Algorithms: A Joint HW/SW Approach
Noureddine Ait Said ... Mounir Benabdenbi
Electronics | VOL. 10
Noureddine Ait Said, et. al.Noureddine Ait Said ... Mounir Benabdenbi
09 Sep 2021
Electronics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Virtual Shuffling for Efficient Data Movement in MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers