Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems

Allen Yang,Ningfang Mi,Ying Mao,Jiayin Wang,Bo Sheng,Yi Yao

doi:10.1109/access.2021.3089907

Abstract

With the rise of big data, more and more users will launch computing systems to process a large volume of data in various applications. A Scheduling algorithm is crucial to the performance of the processing platforms, especially when they are concurrently executing a batch of jobs. Such jobs usually represent multiple stages. Each stage produces the intermediate data which will be piped to the next stage for further processing. However, the scheduling problem in a big data computing system is different from the traditional multi-stage job scheduling problem as for any two consecutive stages, the later stage usually starts before the former stage is finished to “shuffle” the intermediate data. In this paper, we consider MapReduce/Hadoop as a representative computing system and develop a new strategy named OMO, Optimize MapReduce Overlap with a Good Start (Reduce) and a Good Finish (Map). A MapReduce job contains two consecutive phases: map and reduce. Our general target is to optimize the internal overlap between these two phases. There are two new techniques included in our solution, Lazy start of reduce tasks and Batch finish of map tasks, which aim to approach an effective alignment of the two phases based on the characteristics of the MapReduce process. OMO has been implemented on the Hadoop system with extensive experiments for performance evaluation. The results show that OMO's performance is superior in terms of total completion time (i.e., makespan) of a batch of jobs.

Highlights

In the past few years, we have all witnessed the rise of big data and various processing platforms such as Hadoop [1], Mesos [2] and Spark [3], which have been widely adopted in both academia and industry for various applications
The purpose of this paper aims to establish an efficient scheduling scheme in big data computing systems to better the resource consumption and reduce the makespan
This paper studies the scheduling problem in a big data computing system with multiple internal stages, especially in a Hadoop cluster serving a batch of MapReduce jobs

Summary

INTRODUCTION

In the past few years, we have all witnessed the rise of big data and various processing platforms such as Hadoop [1], Mesos [2] and Spark [3], which have been widely adopted in both academia and industry for various applications. This work advances a novel technique, called OMO, that targets on optimizing the overlap in between map and reduce stages This overlapping period plays an essential part in MapReduce processing, when the map stage produces large quantities of information for shuffling. OMO consists of 2 new strategies, lazy start of reduce tasks, and batch finish of map tasks The former strategy efforts to discover the most optimal timing to begin the reduce tasks in order to make sure that an adequate amount of time is allocated for reduce tasks to shuffle the intermediate data, whilst containers can be assigned in order to assist map tasks to the fullest.

RELATED WORK

OUR SOLUTION

1) MOTIVATION

Fe between

COMBINATION OF THE TWO TECHNIQUES

PERFORMANCE EVALUATION

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map)
Jiayin Wang ... Yi Yao
-
Jiayin Wang, et. al.Jiayin Wang ... Yi Yao
01 Dec 2015
01 Dec 2015

Bounds on Multiprocessing Timing Anomalies
R L Graham
SIAM Journal on Applied Mathematics | VOL. 17
R L GrahamR L Graham
01 Mar 1969
SIAM Journal on Applied Mathematics | VOL. 17

Approximation algorithms for two-machine flow shop scheduling with batch setup times
Bo Chen ... Chris N Potts
Mathematical Programming | VOL. 82
Bo Chen, et. al.Bo Chen ... Chris N Potts
01 Jun 1998
Mathematical Programming | VOL. 82

Unbounded Serial-Batching Scheduling on Hierarchical Optimization
Cheng He ... Hao Lin
Journal of the Operations Research Society of China | VOL. 9
Cheng He, et. al.Cheng He ... Hao Lin
09 Jan 2021
Journal of the Operations Research Society of China | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions