Abstract

We develop and implement an optimal broadcast algorithm for fully connected processor networks under a bidirectional communication model in which each processor can simultaneously send a message to one processor and receive a message from another, possibly different processor. For any number of processors p the algorithm requires N − 1 + ⌈ log p ⌉ communication rounds to broadcast N blocks of data from a root processor to the remaining processors, meeting the lower bound in the model. For data of size m , assuming that sending and receiving data of size m ′ takes time α + β m ′ , the best running time that can be achieved by the division of m into equal-sized blocks is ( ( ⌈ log p ⌉ − 1 ) α + β m ) 2 . The algorithm uses a regular, circulant graph communication pattern, and degenerates into a binomial tree broadcast when the number of blocks to be broadcast is one. The algorithm is furthermore well suited to fully connected clusters of SMP ( Symmetric Multi-Processor) nodes. The algorithm is implemented as part of an MPI ( Message Passing Interface) library. We demonstrate significant practical bandwidth improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms on both a small SMP cluster and a 72 node NEC SX vector supercomputer.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.