Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

Jaeyoung Choi,Jack J. Dongarra,David W. Walker

doi:10.1002/cpe.4330060702

Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

Jaeyoung Choi, Jack J. Dongarra + Show 1 more

Open Access

https://doi.org/10.1002/cpe.4330060702

Copy DOI

Journal: Concurrency: Practice and Experience	Publication Date: Oct 1, 1994
Citations: 139

Affiliation: Oak Ridge National Laboratory, University of Tennessee at Knoxville

#Distributed Memory Concurrent Computers #Parallel Matrix Multiplication Algorithms + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

AbstractThe paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.

Full Text