Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling

Elaheh Gavagsaz,Hamid Haj Seyyed Javadi,Ali Rezaee

doi:10.1007/s11227-018-2391-9

Abstract

MapReduce has demonstrated itself to be as a highly efficient programming model for processing massive dataset on the distributed system. One of the most important obstacles hindering the performance of MapReduce is data skewness. The presence of data skewness leads to considerable load imbalance on the reducers and performance degradation. In this paper, the problem of how to efficiently accommodate intermediate data to even up the load of all reducers is studied when encountering skewed data. A scalable sampling algorithm is used which it can observe a more precise approximate distribution of the keys by sampling only a small fraction of the intermediate data. Afterwards, it is applied to evaluate the overall distribution of the keys. In addition, we propose a sorted-balance algorithm based on sampling results: sorted-balance algorithm using scalable simple random sampling (SBaSC). This work not only puts forward a load-balanced partitioning strategy, but also proves a significant approximation ratio of SBaSC. The experiments confirm that our solution attains a better execution time and load balancing results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Apr 30, 2018
Citations: 13

Similar Papers

Load balancing in join algorithms for skewed data in MapReduce systems
Elaheh Gavagsaz ... Ali Rezaee
The Journal of Supercomputing | VOL. 75
Elaheh Gavagsaz, et. al.Elaheh Gavagsaz ... Ali Rezaee
01 Sep 2018
The Journal of Supercomputing | VOL. 75

Load Balancing Algorithm in Distributed System: A Survey of Approaches and Limitations
...
Indian journal of science and technology | VOL. 9
, et. al. ...
16 Jun 2016
Indian journal of science and technology | VOL. 9

Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing
Tzu-Chi Huang ... Ce-Kuen Shieh
-
Tzu-Chi Huang, et. al.Tzu-Chi Huang ... Ce-Kuen Shieh
01 Dec 2016
01 Dec 2016

Dynamic load balancing of iterative data parallel problems on a workstation cluster
Hye-Seon Maeng ... Shin-Dug Kim
-
Hye-Seon Maeng, et. al. Hye-Seon Maeng ... Shin-Dug Kim
28 Apr 1997
28 Apr 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing