Data Memory Layout Research Articles

One of the most fundamental tasks of an automatic parallelization tool is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often significantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper we introduce a new point-to-point communication model (called P-3PC) that is specifically designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of imaging applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network, and a different MPI implementation. Results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.

Read full abstract

One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations), this task may seem trivial. However, communication costs in message-passing programs often depend significantly on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper, we introduce a new point-to-point communication model, called P-3PC (Parameterized model based on the Three Paths of Communication), that is specifically designed to overcome this problem. In comparison with related models (e.g. LogGP), P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message-passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of low-level image processing applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network and a different MPI implementation. The results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.

Read full abstract

Data Memory Layout Research Articles

Related Topics

Articles published on Data Memory Layout

Incorporating memory layout in the modeling of message passing programs

A network flow approach to memory bandwidth utilization in embedded DSP core processors

P-3PC: a point-to-point communication model for automatic and optimal decomposition of regular domain problems

Minimization of Data Address Computation Overhead in DSP Programs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Memory Layout Research Articles

Related Topics

Articles published on Data Memory Layout

Incorporating memory layout in the modeling of message passing programs

A network flow approach to memory bandwidth utilization in embedded DSP core processors

P-3PC: a point-to-point communication model for automatic and optimal decomposition of regular domain problems

Minimization of Data Address Computation Overhead in DSP Programs