Abstract

In a generalized shuffle permutation an address ( a q−1 a q−2 … a 0) receives its content from an address obtained through a cyclic shift on a subset of the q dimensions used for the encoding of the addresses. Bit-complementation may be combined with the shift. We give an algorithm that requires K 2 + 2 exchanges for K elements per processor, when storage dimensions are part of the permutation, and concurrent communication on all ports of every processor is possible. The number of element exchanges in sequence is independent of the number of processor dimensions σ r in the permutation. With no storage dimensions in the permutation our best algorithm requires ( σ 4 r + 1)[ K 2σ r ] element exchanges. We also give an algorithm for σ r = 2, or the real shuffle consists of a number of cycles of length two, that requires K 2 + 1 element exchanges in sequence when there is no bit complement. The lower bound is K 2 for both real and mixed shuffles with no bit-complementation. The minimum number of communication start-ups is σ r for both cases, which is also the lower bound. The data transfer time for communication restricted to one port per processor is σ r ( K 2 ) , and the minimum number of start-ups is σ r. The analysis is verified by experimental results on the Intel iPSC/1, and for one case also on the Connection Machine model CM-2.

Highlights

  • The main contributions of this paper are optimal algorithms for dimension permutations on Boolean cube con gured distributed memory multi-processors, and lower bounds for such permutations with concurrent communication on all channels

  • An extended-cube permutation (ECP) is an algorithm for dimension permutation in which the routing is extended to an ne-cube in which the n-cube holding data is embedded

  • Lemma 1 The time complexity of an Stable Dimension Permutation (SDP) of real order r < n cannot be improved by communication in the n r processor dimensions not included in the index set, if a dimension 4 permutation is required within all r-cubes, and the SDP algorithm uses full bandwidth within each r-cube

Read more

Summary

Introduction

The main contributions of this paper are optimal algorithms for dimension permutations on Boolean cube con gured distributed memory multi-processors, and lower bounds for such permutations with concurrent communication on all channels. Virtual dimension permutations that only include local storage addresses require no communication, and are not considered here. In a Connection Machine model CM-2 each processor has a single 32-bit wide data path to memory, while inter-processor communication channels are 1-bit wide. Examples of dimension permutations are k-shu e/unshu e permutations, matrix transposition, bit-reversal, vector-reversal, and conversion between various data allocation schemes, such as consecutive and cyclic storage 3, 4], reshaping of arrays 7], and multi-sectioning. We consider concurrent communication on all channels of all processors Such communication is possible on the Connection Machine.

Preliminaries
Time complexity
Algorithms
A single GSH
Multiple GSH
K maxi ri
Real shu e algorithms
Experiments
Findings
Summary and conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.