A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi Jaeyoung Choi

doi:10.1109/ipps.1997.580916

A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers

Jaeyoung Choi Jaeyoung Choi

https://doi.org/10.1109/ipps.1997.580916

Copy DOI

Publication Date: Apr 1, 1997

Citations: 27

Affiliation: Soongsil University

#Matrix Multiplication Algorithm #Distributed Memory Concurrent Computers + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.