Abstract

In this paper, we investigate algorithms for generating communication code to run on distributed-memory systems. We modify algorithms from previously published work and prove that the algorithms produce correct code. We then extend these algorithms to incorporate the mapping of virtual to physical processors and prove the correctness of this extension. This technique can reduce the number of interprocessor messages. In the examples that we show, the total number of messages was reduced from O(N 2) to O(P 2), where N is the input size and P is the number of physical processors. The reason that it is important to revisit communication code generation and to introduce a formal specification of the incorporation of mapping in the communication code generation is so that we can make use of the many scheduling heuristics proposed in the literature. We need a generalized mapping function so that we can apply different mapping and scheduling heuristics proposed in the literature for each input program, therefore improving the average performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call