Abstract

The Vector Multiprocessor brings to the multiprocessor what vectorization brought to the single processor. In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit with registers that communicate directly with comparable registers in neighboring processors via an n-dimensional interconnection network. Interprocessor communication tasks are performed to and from these registers in the same way that computational tasks are performed on a uniprocessor. Communication is shown to be optimal for a large class of communication tasks. Elements are transmitted, in parallel, to their destination processors at an average rate of one per communication cycle. This result, called O(1) access, is used to develop a balanced communication system where local and global access are comparable. It is also used to support the vector parallel paradigm where all arrays are uniformly distributed and the user interface looks like a uniprocessor interface. Both coarse- and fine-grain performance models are provided, which demonstrate the unexpected result that communication is asymptotically negligible compared to computational time. Finally, three performance models are presented for the spherical harmonic transform, which is the most communication-intensive part of climate model dynamics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call