Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core Processor

Lucas Morais,Jaume Bosch,Michael Frank,Alfredo Goldman,Carlos Alvarez,Guido Araujo,Vitor Silva

doi:10.1145/3352460.3358271

Abstract

Task Parallelism is a parallel programming model that provides code annotation constructs to outline tasks and describe how their pointer parameters are accessed so that they might be executed in parallel, and asynchronously, by a runtime capable of inferring and honoring their data dependence relationships. It is supported by several parallelization frameworks, as OpenMP and StarSs. Overhead related to automatic dependence inference and to the scheduling of ready-to-run tasks is a major performance limiting factor of Task Parallel systems. To amortize this overhead, programmers usually trade the higher parallelism that could be leveraged from finer-grained work partitions for the higher runtime-efficiency of coarser-grained work partitions. Such problems are even more severe for systems with many cores, as the task spawning frequency required for preserving cores from starvation grows linearly with their number. To mitigate these problems, researchers have designed hardware accelerators to improve runtime performance. Nevertheless, the high CPU-accelerator communication overheads of these solutions hampered their gains. We thus propose a RISC-V based architecture that minimizes communication overhead between the HW Task Scheduler and the CPU by allowing Task Scheduling software to directly interact with the former through custom instructions. Empirical evaluation of the architecture is made possible by an FPGA prototype featuring an eight-core Linux-capable Rocket Chip implementing such instructions. To evaluate the prototype performance, we both (1) adapted Nanos, a mature Task Scheduling runtime, to benefit from the new task-scheduling-accelerating instructions; and (2) developed Phentos, a new HW-accelerated light weight Task Scheduling runtime. Our experiments show that task parallel programs using Nanos-RV --- the Nanos version ported to our system --- are on average 2.13 times faster than those being serviced by baseline Nanos, while programs running on Phentos are 13.19 times faster, considering geometric means. Using eight cores, Nanos-RV is able to deliver speedups with respect to serial execution of up to 5.62 times, while Phentos produces speedups of up to 5.72 times.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core Processor

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Oct 12, 2019
Citations: 8	License type: other-oa

Similar Papers

Programming models and applications for multicores and manycores
Pavan Balaji ... Zhiyi Huang
Concurrency and Computation: Practice and Experience | VOL. 28
Pavan Balaji, et. al.Pavan Balaji ... Zhiyi Huang
26 Nov 2015
Concurrency and Computation: Practice and Experience | VOL. 28

Characterizing and mitigating work time inflation in task parallel programs
...
-
, et. al. ...
10 Nov 2012
10 Nov 2012

Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Stephen L Olivier ... Bronis R De Supinski
Scientific Programming | VOL. 21
Stephen L Olivier, et. al.Stephen L Olivier ... Bronis R De Supinski
01 Jan 2013
Scientific Programming | VOL. 21

Autotuning of a Cut-Off for Task Parallel Programs
Shintaro Iwasaki ... Kenjiro Taura
-
Shintaro Iwasaki, et. al.Shintaro Iwasaki ... Kenjiro Taura
01 Sep 2016
01 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core Processor

Abstract

Talk to us

Similar Papers