Code Partitioning Research Articles

In this paper, we propose a heuristic for code partitioning for distributed memory multiprocessors (DMMs). Our method is data-flow based where all levels of parallelism can potentially be exploited. Given a weighted directed acyclic graph (DAG) representation of the program, our partitioning algorithm automatically determines the granularity of parallelism by partitioning the graph into tasks to be scheduled on the DMM. The granularity of parallelism depends only on the program to be executed and on the target machine parameters. The output of our algorithm is passed on as input to the scheduling phase. Unlike the scheduling problem as defined by Yang [A. Gerasoulis, T. Yang, IEEE Transactions on Parallel and Distributed Systems 4 (6) (1993) 686–701; T. Yang, Ph.D. Thesis, Rutgers University, New Brunswick, NJ, May 1993; T. Yang, A. Gerasoulis, IEEE Transactions on Parallel and Distributed Systems 5 (9) (1994) 951–967], the method presented in this paper uses task merging rather than task clustering. Finding an optimal solution to this problem is NP-complete. Due to the high cost of graph algorithms, it is nearly impossible to come up with close to optimal solutions that do not have very high cost (higher order polynomial). Therefore, our goal is to find a heuristic that gives good performance, and that has relatively low cost. Given a DAG with E edges and N nodes, the time complexity of our partitioning algorithm is O( E· N 3) in the worst case. For some cases, the average time complexity of the algorithm is O( N( E+ N)).

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D.

Code Partitioning Research Articles

Related Topics

Articles published on Code Partitioning

Equivalent formulations and necessary optimality conditions for the Lennard–Jones problem

Wavelet-based very low bit-rate video coding using image warping and overlapped block motion compensation

Dynamic code partitioning for clustered architectures

Error-resilient coding in JPEG-2000 and MPEG-4

An efficient heuristic for code partitioning

Intermediacy Prediction for High Speed Berger Code Checkers

Compilation techniques for parallel systems

A Computation+Communication Load Balanced Loop Partitioning Method for Distributed Memory Systems

Adapting to Hostile Architectural Environments

Generalized algorithm for design of DC-free codes based on multilevel partition chain

Performance modeling and code partitioning for the DS architecture

Analysis of a Heuristic for Code Partitioning

Partitions: A taxonomy of types and representations and an overview of coding techniques

Generalized concatenation of convolutional codes

Computer aided parallelisation tools (CAPTools) — conceptual overview and performance on the parallelisation of structured mesh codes

Automatic parallel code generation for message passing on distributed memory systems

The finest homophonic partition and related code concepts

Throughput analysis of digital partitioning with error-correcting codes for optical matrix–vector processors

Combining digital partitioning and error-correcting codes for high accuracy optical computing

Efficient register allocation via coloring using clique separators

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Code Partitioning Research Articles

Related Topics

Articles published on Code Partitioning

Equivalent formulations and necessary optimality conditions for the Lennard–Jones problem

Wavelet-based very low bit-rate video coding using image warping and overlapped block motion compensation

Dynamic code partitioning for clustered architectures

Error-resilient coding in JPEG-2000 and MPEG-4

An efficient heuristic for code partitioning

Intermediacy Prediction for High Speed Berger Code Checkers

Compilation techniques for parallel systems

A Computation+Communication Load Balanced Loop Partitioning Method for Distributed Memory Systems

Adapting to Hostile Architectural Environments

Generalized algorithm for design of DC-free codes based on multilevel partition chain

Performance modeling and code partitioning for the DS architecture

Analysis of a Heuristic for Code Partitioning

Partitions: A taxonomy of types and representations and an overview of coding techniques

Generalized concatenation of convolutional codes

Computer aided parallelisation tools (CAPTools) — conceptual overview and performance on the parallelisation of structured mesh codes

Automatic parallel code generation for message passing on distributed memory systems

The finest homophonic partition and related code concepts

Throughput analysis of digital partitioning with error-correcting codes for optical matrix–vector processors

Combining digital partitioning and error-correcting codes for high accuracy optical computing

Efficient register allocation via coloring using clique separators