Abstract

Many prior works have investigated on how to increase the job processing performance and energy efficient computing in large scale clusters. However, they employ serialized scheduling approaches encompassed with task straggler “hunting” techniques which launches speculative tasks after detecting slow tasks. These slow tasks are detected through node instrumentation which collects system level information whilst tracking the task execution progress. Such approaches are however detrimental towards achieving maximum processing performance and preserving cluster energy as they increase communication overheads. In this paper, we observe that node instrumentation and serialized scheduling in existing works does not only degrade the job processing performance, but also increase cluster energy consumption. To alleviate this, we propose EPPADS, a light-weight scheduler which eradicates the need for instrumentation modules for job scheduling purposes. EPPADS schedules tasks in two stages, the slow-start phase (SSP) and accelerate phase (AccP). The SSP schedules initial tasks in the queue using baseline FIFO scheduling and records the initial execution times of the processing nodes, whilst tagging the effective and straggling nodes. The AccP uses the initial execution times to compute the processing nodes task distribution ratio of remaining tasks and schedules them in parallel using a single scheduling I/O, boosting up the processing performance. To amortize the computing energy costs, EPPADS implements a power management module that coordinates with the scheduling module and leverage on node tagging information, to place nodes in two different power transition pools, i.e., high and low state power pools. A single power transition signal per pool is then broadcasted to lower or raise the energy state in the low-power state pool and high-power state pool. Our evaluation using a Hadoop cluster shows that EPPADS achieves 30% and 22% performance improvement and 15% to 20% energy savings as compared to the FIFO and DynMon schedulers, respectively.

Highlights

  • For the past two decades, there has been a continuous trend in big data proliferation, with the amount of stored data doubling every 2 years [1]

  • Value: This value is used in the second pipeline stage of Enhanced Phase-based Performance Aware Dynamic Scheduler (EPPADS) and it defines the number of tasks to be distributed to a task tracker node during the accelerate phase (AccP) stage

  • We started with 4 task tracker nodes and doubled the nodes at each experimental run until the size depicted a large sized data processing cluster-128 task trackers

Read more

Summary

INTRODUCTION

For the past two decades, there has been a continuous trend in big data proliferation, with the amount of stored data doubling every 2 years [1]. Iii) Increased instrumentation overheads which are caused by the frequent collection of system level information of the task tracker nodes to forecast the task completion time in order to make scheduling decisions Such communication overheads result in significant performance degradation and increased cluster power consumption. Effective and simple cluster power management: EPPADS provision an efficient energy computational model by providing a simple yet effective power management scheme that tightly integrate the job scheduling together with the power management functions We achieve this through the implementation of a pool based software defined dynamic frequency scaling that transit the power usage of nodes between high and low power states.

BACKGROUND
28 Slide by offset to next scheduling window
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.