Abstract
In this paper we benchmark the performance of the Cray T3D, IBM 9076 SP/1 and Intel Paragon XP/S parallel computers, using implementations of parallel algorithms for the computation of the vector outer-product A = uvT operation. The vector outer-product operation, although very simple in nature, requires the computation of a large number of floating-point operations and its parallelization induces a great level of communication between the processors. It is thus suited to measure the relative speed of the processor, memory subsystem and network capabilities of a parallel computer. It should not be considered a ‘toy problem’, since it arises in numerical methods in the context of the solution of systems of non-linear equations – still a difficult problem to solve. We present algorithms for both the explicit shared-memory and message-passing programming models together with theoretical computation models for those algorithms. Actual experiments were run on those computers, using Fortran 77 implementations of the algorithms. The results obtained with these experiments show that due to the high degree of communication between the processors one needs a parallel computer with fast communications and carefully implemented data exchange routines. The theoretical computation model allows prediction of the speed-up to be obtained for some problem size on a given number of processors. © 1997 John Wiley & Sons, Ltd.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.