Number Of Loop Iterations Research Articles

In order to improve a parallel program's performance it is critical to evaluate how even the work contained in a program is distributed over all processors dedicated to the computation. Traditional work distribution analysis is commonly performed at the machine level. The disadvantage of this method is that it cannot identify whether the processors are performing useful or redundant (replicated) work. The paper describes a novel method of statically estimating the useful work distribution of distributed-memory parallel programs at the program level, which carefully distinguishes between useful and redundant work. The amount of work contained in a parallel program, which correlates with the number of loop iterations to be executed by each processor, is estimated by accurately modeling loop iteration spaces, array access patterns and data distributions. A cost function defines the useful work distribution of loops, procedures and the entire program. Lower and upper bounds of the described parameter are presented. The computational complexity of the cost function is independent of the program's problem size, statement execution and loop iteration counts. As a consequence, estimating the work distribution based on the described method is considerably faster than simulating or actually compiling and executing the program. Automatically estimating the useful work distribution is fully implemented as part of P3T, which is a static parameter based performance prediction tool under the Vienna Fortran Compilation System (VFCS). The Lawrence Livermore Loops are used as a test case to verify the approach.

Read full abstract

The problem of scheduling non-deterministic graphs arises in several situations in scheduling parallel programs, particularly in the cases of loops and conditional branching. When scheduling loops in a parallel program, non-determinism arises because the number of loop iterations may not be known before the execution of the program. However, since loops from a restricted class of conditional branching, there is a higher degree of non-determinism associated with scheduling conditional branching. In this case, the direction of every branch remains unknown before run time. It follows that entire subprograms of the parallel program may or may not get executed, which in turn increases the amount of non-determinism and complicates the scheduling process. Thus, the term non-determinism is frequently associated with conditional branching in the literature. In this paper, we study the problem of constructing a static schedule for task graphs that contain conditional branching on parallel computers. Generally, it is difficult to obtain optimal solutions for solving various scheduling problems, even in the deterministic case. When non-determinism is added to the scheduling problem through conditional branching, an optimal solution will be even harder to obtain. We start the paper with a brief discussion of the scheduling problem, then we introduce a model for representing parallel programs that contain branches. We present a two-step scheduling technique which employs two different approaches: a graph theoretic appraoch and a multi-phase approach. The first approach is based on exploring several graph theoretic properties of the model. This approach is used as a preprocessing step to decrease the amount of non-determinism before applying the multi-phase approach. In the second step, several execution instances of the program are generated, a schedule for every instance is obtained, and a unified schedule is constructed by merging the obtained schedules. Finally, we report the results of the experiments that we conducted to measure the performance of the techniques introduced in this paper.

Read full abstract

Number Of Loop Iterations Research Articles

Related Topics

Articles published on Number Of Loop Iterations

Potential and methods for embedding dynamic offloading decisions into application code

Analysis of Linear Definite Iterative Loops

DAG inlining: a decision procedure for reachability-modulo-theories in hierarchical programs

Modeling flow information of loops using compositional condition of controls

Infinite Loop Detection Based on Chains of Recurrence Algebra and Convergence of Iterative Sequence

Recognizing Plans with Loops Represented in a Lexicalized Grammar

An Improvement of Twisted Ate Pairing Efficient for Multi-Pairing and Thread Computing

Verification and falsification of programs with loops using predicate abstraction

Generalizing parametric timing analysis

Supporting Timing Analysis by Automatic Bounding of LoopIterations

An Integrated Path and Timing Analysis Method based on Cycle-Level Symbolic Execution

On estimating the useful work distribution of parallel programs under P3T: a static performance estimator

Static Scheduling of Conditional Branches in Parallel Programs

Double- and triple-step incremental linear interpolation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Number Of Loop Iterations Research Articles

Related Topics

Articles published on Number Of Loop Iterations

Potential and methods for embedding dynamic offloading decisions into application code

Analysis of Linear Definite Iterative Loops

DAG inlining: a decision procedure for reachability-modulo-theories in hierarchical programs

Modeling flow information of loops using compositional condition of controls

Infinite Loop Detection Based on Chains of Recurrence Algebra and Convergence of Iterative Sequence

Recognizing Plans with Loops Represented in a Lexicalized Grammar

An Improvement of Twisted Ate Pairing Efficient for Multi-Pairing and Thread Computing

Verification and falsification of programs with loops using predicate abstraction

Generalizing parametric timing analysis

Supporting Timing Analysis by Automatic Bounding of LoopIterations

An Integrated Path and Timing Analysis Method based on Cycle-Level Symbolic Execution

On estimating the useful work distribution of parallel programs under P3T: a static performance estimator

Static Scheduling of Conditional Branches in Parallel Programs

Double- and triple-step incremental linear interpolation