Distributed Memory Systems Research Articles

In compile-time task scheduling for distributed-memory systems, list scheduling is generally accepted as an attractive approach, since it pairs low cost with good results. List-scheduling algorithms schedule tasks in order of their priority. This priority can be computed either (1) statically, before the scheduling, or (2) dynamically, during the scheduling. In this paper, we show that list scheduling with statically-computed priorities (LSSP) can be performed at a significantly lower cost than existing approaches, without sacrificing performance. Our approach is general, i.e. it can be applied to any LSSP algorithm. The low complexity is achieved by using low-complexity methods for the most time-consuming parts in list-scheduling algorithms, i.e. processor selection and task selection, preserving the criteria used in the original algorithms. We exemplify our method by applying it to the MCP (Modified Critical Path) algorithm. Using an extension of this method, we can also reduce the time complexity of a particular class of list scheduling with dynamic priorities (LSDP) [including algorithms such as DLS (Dynamic Level Scheduling), ETF (Earliest Task First) and ERT (Earliest Ready Task)]. Our results confirm that the modified versions of the list-scheduling algorithms obtain a performance comparable to their original versions, yet at a significantly lower cost. We also show that the modified versions of the list-scheduling algorithms consistently outperform multi-step algorithms, such as DSC-LLB (Dynamic Sequence Clustering with List Load Balancing), which also have higher complexity and clearly outperform algorithms in the same class of complexity, such as CPM (Critical Path Method).

Read full abstract

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration . The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graph's root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance. Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackal's compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackal's optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.

Read full abstract

Distributed Memory Systems Research Articles

Related Topics

Articles published on Distributed Memory Systems

Efficient parallel implementations of near Delaunay triangulation with High Performance Fortran

Progressive radiosity method on clusters using a new clipping algorithm

SPEC HPG benchmarks for high-performance systems

Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Node-based parallel computing of three-dimensional incompressible flows using the free mesh method

Parallel computing of high‐speed compressible flows using a node‐based finite‐element method

Using an interactive parallelisation toolkit to parallelise an ocean modelling code

A parallel implementation of an asynchronous team to the point-to-point connection problem

Java PastSet: a structured distributed shared memory system

Non-strict execution in parallel and distributed computing

Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

An efficient causal logging scheme for recoverable distributed shared memory systems

Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations

O(N) parallel tight binding molecular dynamics simulation of carbon nanotubes

A data and task parallel image processing environment

Low-cost task scheduling for distributed-memory machines

DOMAIN DECOMPOSITION METHOD FOR ADVECTION-DIFFUSION EQUATIONS WITH QSI SCHEME

A flexible framework for consistency management

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Source-level global optimizations for fine-grain distributed shared memory systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Memory Systems Research Articles

Related Topics

Articles published on Distributed Memory Systems

Efficient parallel implementations of near Delaunay triangulation with High Performance Fortran

Progressive radiosity method on clusters using a new clipping algorithm

SPEC HPG benchmarks for high-performance systems

Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Node-based parallel computing of three-dimensional incompressible flows using the free mesh method

Parallel computing of high‐speed compressible flows using a node‐based finite‐element method

Using an interactive parallelisation toolkit to parallelise an ocean modelling code

A parallel implementation of an asynchronous team to the point-to-point connection problem

Java PastSet: a structured distributed shared memory system

Non-strict execution in parallel and distributed computing

Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

An efficient causal logging scheme for recoverable distributed shared memory systems

Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations

O(N) parallel tight binding molecular dynamics simulation of carbon nanotubes

A data and task parallel image processing environment

Low-cost task scheduling for distributed-memory machines

DOMAIN DECOMPOSITION METHOD FOR ADVECTION-DIFFUSION EQUATIONS WITH QSI SCHEME

A flexible framework for consistency management

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Source-level global optimizations for fine-grain distributed shared memory systems