Optimal broadcast for fully connected processor-node networks

Jesper Larsson Träff,Andreas Ripke

doi:10.1016/j.jpdc.2007.12.001

Abstract

We develop and implement an optimal broadcast algorithm for fully connected processor networks under a bidirectional communication model in which each processor can simultaneously send a message to one processor and receive a message from another, possibly different processor. For any number of processors p the algorithm requires N − 1 + ⌈ log p ⌉ communication rounds to broadcast N blocks of data from a root processor to the remaining processors, meeting the lower bound in the model. For data of size m , assuming that sending and receiving data of size m ′ takes time α + β m ′ , the best running time that can be achieved by the division of m into equal-sized blocks is ( ( ⌈ log p ⌉ − 1 ) α + β m ) 2 . The algorithm uses a regular, circulant graph communication pattern, and degenerates into a binomial tree broadcast when the number of blocks to be broadcast is one. The algorithm is furthermore well suited to fully connected clusters of SMP ( Symmetric Multi-Processor) nodes. The algorithm is implemented as part of an MPI ( Message Passing Interface) library. We demonstrate significant practical bandwidth improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms on both a small SMP cluster and a 72 node NEC SX vector supercomputer.

Full Text