Abstract

All the linear transformation techniques that can be used for mapping nested FOR loops onto systolic arrays have the “curse of dimensionality”, that is, larger the number of nestings, higher the mapping complexity. We propose a new approach for determining these mappings that involve finding first coarse-grained mappings that are easier to determine and then refining these mappings through some simple transformations, to obtain efficient fine-grained mappings. The grain size refers to the size of the computational ‘chunks’ performed by a processor between two synchronization points. In the simplest case, a coarse-grained mapping maps the individual iterations of a nested loop algorithm assuming they are executed asynchronously. A fine-grained mapping, on the other hand, maps the individual operations of an iteration assuming the processors to be synchronized at the end of execution of each operation. In this paper, we propose some refinement techniques that can derive efficient fine-grained mappings for linear systolic arrays. We demonstrate how these techniques can be used to derive easily efficient mappings for all-pair shortest path and matrix multiplication algorithms. In the case of matrix multiplication algorithm, we obtain a piecewise linear mapping whose performance is comparable to some of the well-known mappings for linear arrays.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call