Goal-Oriented Job Scheduling for Parallel Computer Systems

Sangsuree Vasupongayya

doi:10.15760/etd.7938

Abstract

System administrators for parallel computers face many difficulties when managing job scheduling systems. First, current production job schedulers use many parameters, which seem flexible but it is highly challenging to configure and tune these parameters. Second, fair share is an important scheduling goal, but it is not clear what kind of fair share can be expected under current schedulers and how fair share impacts scheduling performance. Third, several job runtime prediction methods were proposed to improve inaccurate user-estimated runtimes, but these methods could under-estimate runtimes by a large amount and it is not clear whether they are practical for use on real systems. To address these issues, we study existing scheduling policies and design new policies. We evaluate policy performance by event-driven simulation, using real job traces. To simplify the system administration task, we propose a new scheduling framework, which allows the system administrators to specify only high-level objectives, while the scheduler automatically decides the schedules according to the given objectives and adapts to workload changes. We investigate several design and implementation choices of the goal-oriented policies. We show that by optimizing performance for objectives, goal-oriented policies have the potential to considerably improve the performance. To provide a better understanding of fair share policies supported by current production schedulers and their impact on scheduling performance, we evaluate two classes of fair share policies using a wide range of performance measures and several fair share measures proposed in this thesis. Our evaluation results show that fair share indeed reduces heavy-demand users from dominating system resources. However, our detailed per-user performance results show that some types of users may suffer unfairness under fair share, possibly due to priority mechanisms used by the current schedulers. As for runtime predictions, we find that using previous methods results in poor performance and unfairness problems, because of under-estimated runtimes induced by predictions. To reduce the problems, we investigate several alternative methods, including inflated each initial prediction by half of the requested runtime and two-class runtime estimates. We find that these alternative methods can outperform previous methods in most cases.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Goal-Oriented Job Scheduling for Parallel Computer Systems

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Development of instrument classification system administration of servers
Yu I Katkov
Connectivity | VOL. 156
Yu I KatkovYu I Katkov
01 Jan 2021
Connectivity | VOL. 156

Design and Potential Performance of Goal-Oriented Job Scheduling Policies for Parallel Computer Workloads
Su-Hui Chiang ... S Vasupongayya
IEEE Transactions on Parallel and Distributed Systems | VOL. 19
Su-Hui Chiang, et. al. Su-Hui Chiang ... S Vasupongayya
01 Dec 2008
IEEE Transactions on Parallel and Distributed Systems | VOL. 19

8.2 - Systems Administration as a Self-Organizing System: The Professionalization of SA via Interest and Advocacy Groups
Strata R Chalup
Handbook of Network and System Administration | VOL. -
Strata R ChalupStrata R Chalup
01 Jan 2008
Handbook of Network and System Administration | VOL. -

Optimization challenges of resources in it outsourcing companies in uncertain market conditions
Olena Kryvoruchko ... Daniil Tretyakov
Management of Development of Complex Systems | VOL. -
Olena Kryvoruchko, et. al.Olena Kryvoruchko ... Daniil Tretyakov
27 Sep 2024
Management of Development of Complex Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Goal-Oriented Job Scheduling for Parallel Computer Systems

Abstract

Published Version

Talk to us

Similar Papers