Abstract

The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call