Abstract

Array statements are often used to express data-parallelism in scientific languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, efficient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local memory access sequence and communication sets can be efficiently enumerated as closed forms using regular sections. First, closed form solutions are presented for arrays that are distributed using block or cyclic distributions. These closed forms are then used with avirtual processor approachto give an efficient solution for arrays with block-cyclic distributions. This approach is based on viewing a block-cyclic distribution as a block (or cyclic) distribution on a set of virtual processors, which are cyclically (or block-wise) mapped to physical processors. These views are referred to asvirtual-blockorvirtual-cyclicviews, depending on whether a block or cyclic distribution of the array on the virtual processors is used. The virtual processor approach permits different schemes based on the combination of the virtual processor views chosen for the different arrays involved in an array statement. These virtualization schemes have different indexing overhead. We present a strategy for identifying the virtualization scheme which will have the best performance. Performance results on a Cray T3D system are presented for hand-compiled code for array assignments. These results show that using the virtual processor approach, efficient code can be generated for execution of array statements involving block-cyclically distributed arrays.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call