Abstract
Shared memory multiprocessors have the appeal of presenting a common address space and requiring little data replication. However, they suffer from a lack of scalability due to a high degree of contention and a non-uniform access to the shared memory. Distributed shared memory (DSM) multiprocessors with hardware cache coherency are gaining popularity since they offer scalability as well as ease of programming. The first commercial effort in this direction has come from Kendall Square Research in the form of KSR1. This paper attempts to describe the nature of parallel algorithmic decomposition that must be done for an algorithm to perform well on this class of cache coherent MIMD supercomputer. A computationally intensive problem in radiative transfer is considered for parallelization on KSR1. The nature of transformations necessary for the algorithm to perform well are described. A technique for quickly obtaining a logical partitioning of the problem space lending to near-optimal speedup is outlined. The performance of the parallel algorithm derived using this technique is quite promising. The results demonstrate that DSM offers the advantage of ease in application design, and that the concurrent processes can achieve a high degree of speedup alter the serial algorithm's execution profile is generated at a functional level. >
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have