A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

F G Gustavson,R C Agarwal,M Zubair

doi:10.1147/rd.386.0673

A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

F G Gustavson, R C Agarwal + Show 1 more

https://doi.org/10.1147/rd.386.0673

Copy DOI

Journal: IBM Journal of Research and Development	Publication Date: Nov 1, 1994
Citations: 78

Affiliation: IBM Research - Thomas J. Watson Research Center

#Distributed-memory Parallel Computer #Underlying Communication Network + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper, we propose a scheme for matrix-matrix multiplication on a distributed-memory parallel computer. The scheme hides almost all of the communication cost with the computation and uses the standard, optimized Level-3 BLAS operation on each node. As a result, the overall performance of the scheme is nearly equal to the performance of the Level-3 optimized BLAS operation times the number of nodes in the computer, which is the peak performance obtainable for parallel BLAS. Another feature of our algorithm is that it can give peak performance for larger matrices, even if the underlying communication network of the computer is slow.

Full Text