Abstract

Modern data parallel frameworks, such as Apache Spark, are designed to execute complex data processing jobs that contain a large number of tasks, with dependencies between these tasks represented by a directed acyclic graph (DAG). When scheduling these tasks, the ultimate objective is to minimize the makespan of the schedule, which is equivalent to minimizing the job completion time. With task dependencies, however, minimizing the makespan of the schedule is non-trivial, especially when tasks in the DAG have different resource demands with respect to multiple resource types. In this paper, we present Spear, a new scheduling framework designed to minimize the makespan of complex jobs, while considering both task dependencies and heterogeneous resource demands at the same time. Inspired by recent advances in artificial intelligence, Spear applies Monte Carlo Tree Search (MCTS) in the specific context of task scheduling, and trains a deep reinforcement learning model to guide the expansion and rollout steps in MCTS. With deep reinforcement learning, search efficiency can be significantly improved by focusing on more promising branches. With both simulations and experiments using traces from production workloads, we compare the scheduling performance of Spear with state-of-the-art job schedulers in the literature, and Spear can outperform those approaches by up to 20%. Our results have validated our claims that MCTS and deep reinforcement learning can readily be applied to optimize the scheduling of complex jobs with task dependencies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.