Abstract
System administrators for parallel computers face many difficulties when managing job scheduling systems. First, current production job schedulers use many parameters, which seem flexible but it is highly challenging to configure and tune these parameters. Second, fair share is an important scheduling goal, but it is not clear what kind of fair share can be expected under current schedulers and how fair share impacts scheduling performance. Third, several job runtime prediction methods were proposed to improve inaccurate user-estimated runtimes, but these methods could under-estimate runtimes by a large amount and it is not clear whether they are practical for use on real systems. To address these issues, we study existing scheduling policies and design new policies. We evaluate policy performance by event-driven simulation, using real job traces. To simplify the system administration task, we propose a new scheduling framework, which allows the system administrators to specify only high-level objectives, while the scheduler automatically decides the schedules according to the given objectives and adapts to workload changes. We investigate several design and implementation choices of the goal-oriented policies. We show that by optimizing performance for objectives, goal-oriented policies have the potential to considerably improve the performance. To provide a better understanding of fair share policies supported by current production schedulers and their impact on scheduling performance, we evaluate two classes of fair share policies using a wide range of performance measures and several fair share measures proposed in this thesis. Our evaluation results show that fair share indeed reduces heavy-demand users from dominating system resources. However, our detailed per-user performance results show that some types of users may suffer unfairness under fair share, possibly due to priority mechanisms used by the current schedulers. As for runtime predictions, we find that using previous methods results in poor performance and unfairness problems, because of under-estimated runtimes induced by predictions. To reduce the problems, we investigate several alternative methods, including inflated each initial prediction by half of the requested runtime and two-class runtime estimates. We find that these alternative methods can outperform previous methods in most cases.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.