A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi Jaeyoung Choi

doi:10.1109/hpc.1997.592151

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi Jaeyoung Choi

Open Access

https://doi.org/10.1109/hpc.1997.592151

Copy DOI

Publication Date: Apr 28, 1997

Citations: 10

Affiliation: Soongsil University

#Parallel Matrix Multiplication Algorithm #Distributed Memory Concurrent Computers + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The author presents a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and calls it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.

Full Text