Topology Aware Task Stealing for On-chip NUMA Multi-core Processors

Rajeev Wankar,C Raghavendra Rao,B Vikranth

doi:10.1016/j.procs.2013.05.201

Rajeev Wankar, C Raghavendra Rao + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2013.05.201

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2013
Citations: 13	License type: cc-by-nc-nd

Affiliation: University of Hyderabad

Abstract

“The On Chip NUMA Architectures (OCNA) introduce a new challenge namely memory-latency to the scheduling methods. The language run-times and libraries try to explore the processing power of these multiple cores by mapping the user-created tasks on to these cores by using suitable scheduling algorithms with load balancing support to improve throughput. The popular load balancing techniques used are work-sharing and work-stealing and many run-time systems such as Cilk, TBB and wool implement task stealing algorithm to schedule the tasks on to the cores by multiplexing the program generated tasks on to the native worker threads supported by the operating system. But the task stealing strategy applied in present run-time systems assumes the sharing the last level cache (LLC) and common shared bus among all cores on Chip Multi Processor. It tries to optimize the utilization without considering the presence of multiple On Die DRAM controllers and their topological arrangements. Current task stealing technique also suffers from problem of randomly choosing the victim worker queue. In this paper we address these issues and propose a solution for these problems by suggesting few optimizations. Our proposed task stealing strategy dynamically analyzes the topology of the underlying hardware connections and models the group of cores and connections as a logical topology tree. This logical tree is translated into multiple worker pools called stealing domains. By restricting the task stealing within these domains, this strategy is implemented and shows an average of 1.24 times better performance on NAS Parallel Benchmark programs compared to popular runtimes Cilk and OpenMP.

Full Text