Loop Execution Research Articles

Dynamic dataflow machines exploit parallelism among loop iterations by loop unraveling : all iterations of the loop are started together and operations in various iterations execute when their input data are present. Unbounded loop unraveling can strain the resources available on the machine and, in extreme cases, deadlock can occur due to overcommitment of resources. Previous efforts to address this problem have focused mainly on run-time mechanisms of debatable utility. Loop bounding, a compile-time technique, controls parallelism by permitting a fixed number of iterations to execute at one time. In this paper, we argue that loop bounding can lead to inefficient use of resources, and we propose an alternative way of compiling loops for overlapped execution of loop iterations. We introduce the notion of a stage decomposition of a loop, which defines a partition of the operations in a loop iteration into stages, and we show that the problem of choosing a stage decomposition for a particular loop can be tackled by applying static scheduling techniques like the ones used in generating code for VLIW machines. These techniques permit the compiler to allocate resources more skillfully than with loop bounding. The practical utility of stage decomposition remains to be tested on a real dataflow machine. In the absence of one, we describe how our schema could be implemented on the Monsoon dataflow machine being built at MIT.

This paper describes an AL parallelizing compiler for the Warp systolic array. AL is a programming language which allows the user to program Warp as if it were a sequential computer and rely on the compiler to generate efficient parallel code. This paper introduces the notion of data relations for systolic array parallelizing compilers. Unlike dependence relations among statements of a program, data relations define compatibility relations among data objects of a program. The AL compiler uses data relations to compute data compatibility classes, determine data distribution, distribute loop iterations, and parallelize loop execution. The AL compiler can generate efficient parallel code almost identical to code the user would have written by hand. For example, the AL compiler generates parallel code for the LINPACK LU decomposition (SGEFA) and QR decomposition (SQRDC) routines with a nearly eight-fold speedup on the 10-cell Warp array for matrices of size 180 × 180.

Loop Execution Research Articles

Related Topics

Articles published on Loop Execution

Static scheduling for dynamic dataflow machines

A systolic array parallelizing compiler

On the design of VLSI architectures for parallel execution of DO loops

Run-time scheduling and execution of loops on message passing machines

Overlapped loop support in the Cydra 5

A simplified framework for reduction in strength

Optimization for the parallel execution of non-DO loops under Leading Iteration Model

Multiprocessor synchronization for concurrent loops

Iteration-level parallel execution of DO loops with a reduced set of dependence relations

Parallel execution of general loops by the pyramid method

Fast Execution of Loops with IF Statements

Fast execution of loops with if statements

Parallel execution of loops: The pyramid method

The parallel execution of loops: The parallelepiped method

Recognition of loop parallelisms by simulated execution

The parallel execution of DO loops

CDC 6600/7600 optimization

Instrumentation of a NASAP Subroutine

ILLIAC II-A Short Description and Annotated Bibliography

Distributed Solution of Network Programming Problems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Loop Execution Research Articles

Related Topics

Articles published on Loop Execution

Static scheduling for dynamic dataflow machines

A systolic array parallelizing compiler

On the design of VLSI architectures for parallel execution of DO loops

Run-time scheduling and execution of loops on message passing machines

Overlapped loop support in the Cydra 5

A simplified framework for reduction in strength

Optimization for the parallel execution of non-DO loops under Leading Iteration Model

Multiprocessor synchronization for concurrent loops

Iteration-level parallel execution of DO loops with a reduced set of dependence relations

Parallel execution of general loops by the pyramid method

Fast Execution of Loops with IF Statements

Fast execution of loops with if statements

Parallel execution of loops: The pyramid method

The parallel execution of loops: The parallelepiped method

Recognition of loop parallelisms by simulated execution

The parallel execution of DO loops

CDC 6600/7600 optimization

Instrumentation of a NASAP Subroutine

ILLIAC II-A Short Description and Annotated Bibliography

Distributed Solution of Network Programming Problems