Abstract
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intelligent scheduling is critical for achieving high parallelism, low overheads and reduced communication. A key technique for load balancing task DAGs is work stealing (WS), which Blumofe et al. popularized for fork-join computations [2]. In scenarios of high parallel slackness, WS's distributed nature allows it to scale to a large number of cores with low overhead [4]. However, the space of a WS computation grows proportionally to the number of cores. Targeting a lower bound, Blelloch et al. proposed the parallel-depth-first (PDF) scheduler [1]. PDF schedules tasks by following the depth-first (serial) order of computation and has space requirements closer to the serial execution. PDF has been shown to provide constructive cache sharing in modern multicore architectures [3]. However, implementing PDF requires a centralized scheduler which limits scalability. Targeting NUMA architectures, Olivier et al. proposed to load balance multiple PDF schedulers via WS [8]. While enabling scalability to larger systems, such approach still suffers from centralized scheduling of fine-grained parallelism [9]. Furthermore, for applications in which the amount of parallelism varies greatly, a fixed hierarchy of PDF queues is not enough.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.