Task Parallel Programming Research Articles

Recent work has proposed a memory property for parallel programs, called disentanglement, and showed that it is pervasive in a variety of programs, written in different languages, ranging from C/C++ to Parallel ML, and showed that it can be exploited to improve the performance of parallel functional programs. All existing work on disentanglement, however, considers the "fork/join" model for parallelism and does not apply to "futures", the more powerful approach to parallelism. This is not surprising: fork/join parallel programs exhibit a reasonably strict dependency structure (e.g., series-parallel DAGs), which disentanglement exploits. In contrast, with futures, parallel computations become first-class values of the language, and thus can be created, and passed between functions calls or stored in memory, just like other ordinary values, resulting in complex dependency structures, especially in the presence of mutable state. For example, parallel programs with futures can have deadlocks, which is impossible with fork-join parallelism. In this paper, we are interested in the theoretical question of whether disentanglement may be extended beyond fork/join parallelism, and specifically to futures. We consider a functional language with futures, Input/Output (I/O), and mutable state (references) and show that a broad range of programs written in this language are disentangled. We start by formalizing disentanglement for futures and proving that purely functional programs written in this language are disentangled. We then generalize this result in three directions. First, we consider state (effects) and prove that stateful programs are disentangled if they are race free. Second, we show that race freedom is sufficient but not a necessary condition and non-deterministic programs, e.g. those that use atomic read-modify-operations and some non-deterministic combinators, may also be disentangled. Third, we prove that disentangled task-parallel programs written with futures are free of deadlocks, which arise due to interactions between state and the rich dependencies that can be expressed with futures. Taken together, these results show that disentanglement generalizes to parallel programs with futures and, thus, the benefits of disentanglement may go well beyond fork-join parallelism.

Read full abstract

Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the “serial-projection property,” which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler. For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM’s half-million-line core middle-end functionality. These changes sufficed to enable LLVM’s existing compiler optimizations for serial code—including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination—to work with parallel control constructs such as parallel loops and Cilk’s Cilk_Spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way.

Read full abstract

Task Parallel Programming Research Articles

Related Topics

Articles published on Task Parallel Programming

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs

Disentanglement with Futures, State, and Interaction

Traveler: Navigating Task Parallel Traces for Performance Analysis.

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

Tapir

DuctTeip: An efficient programming model for distributed task-based parallel computing

Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling

Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and Concerns.

Blaze-Tasks

Global Dead-Block Management for Task-Parallel Programs

FunctionFlow: coordinating parallel tasks

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs

Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Efficient execution of recursive programs on commodity vector hardware

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

TProf: An energy profiler for task-parallel programs

Compiler multiversioning for automatic task granularity control

Fence-free work stealing on bounded TSO processors

Fence-free work stealing on bounded TSO processors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Task Parallel Programming Research Articles

Related Topics

Articles published on Task Parallel Programming

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs

Disentanglement with Futures, State, and Interaction

Traveler: Navigating Task Parallel Traces for Performance Analysis.

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

Tapir

DuctTeip: An efficient programming model for distributed task-based parallel computing

Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling

Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and Concerns.

Blaze-Tasks

Global Dead-Block Management for Task-Parallel Programs

FunctionFlow: coordinating parallel tasks

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs

Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Efficient execution of recursive programs on commodity vector hardware

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

TProf: An energy profiler for task-parallel programs

Compiler multiversioning for automatic task granularity control

Fence-free work stealing on bounded TSO processors

Fence-free work stealing on bounded TSO processors