Abstract

The authors implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Each method is coded in C and MPL and uses only the global router to transfer data among processors. Implementation in MPL limits the amount of control the programmer has over the global router. The methods used were a naive method; the PE method, which adapts the block BMMC algorithm for parallel disk systems to treat processor elements as independent devices; and the cluster method, which adapts the block BMMC algorithm to treat clusters of processor elements as independent devices. The authors also implemented variations of the first two methods to compute virtual-processor numbers in Gray-code order. Our results indicate that for random BMMC permutations, the best method overall is the naive method with the Gray-code variation. The PE methods are almost as good, especially when each processor element has several items to route. For some permutations, however, the naive method performs very poorly; the best method in these cases is the PE method with the Gray-code variant. Although one might expect the cluster method to be best, since it minimizes congestion at the global router ports, this method turns out to be the worst overall, and it is always much worse than the PE method. If it had been coded at a lower level than MPL, the cluster method may well have turned out to be the best in many cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.