A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi

doi:10.1002/(sici)1096-9128(199807)10:8<655::aid-cpe369>3.0.co;2-o

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi

Open Access

https://doi.org/10.1002/(sici)1096-9128(199807)10:8<655::aid-cpe369>3.0.co;2-o

Copy DOI

Journal: Concurrency: Practice and Experience	Publication Date: Jul 1, 1998
Citations: 35

Affiliation: Soongsil University

#Distributed-memory Concurrent Computers #Matrix Multiplication Algorithm + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We present a new fast and scalable matrix multiplication algorithm called DIMMA (distribution-independent matrix multiplication algorithm) for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS (basic linear algebra subprograms) routine in each processor even when the block size is very small or very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer. © 1998 John Wiley & Sons, Ltd.

Full Text