Parallel Matrix Multiplication: A Systematic Journey

Martin D Schatz,Robert A Van De Geijn,Jack Poulson

doi:10.1137/140993478

Abstract

We expose a systematic approach for developing distributed-memory parallel matrix-matrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multiplication algorithms that view the nodes as a two-dimensional (2D) mesh, and finishes with extending these 2D algorithms to so-called three-dimensional (3D) algorithms that view the nodes as a 3D mesh. A cost analysis shows that the 3D algorithms can attain the (order of magnitude) lower bound for the cost of communication. The paper introduces a taxonomy for the resulting family of algorithms and explains how all algorithms have merit depending on parameters such as the sizes of the matrices and architecture parameters. The techniques described in this paper are at the heart of the Elemental distributed-memory linear algebra library. Performance results from implementation within and with this library are given on a representative distributed-memory architecture, the IBM Blue Gene/P supercomputer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel Matrix Multiplication: A Systematic Journey

Abstract

Talk to us

Similar Papers

More From: SIAM Journal on Scientific Computing

Lead the way for us

Journal: SIAM Journal on Scientific Computing	Publication Date: Jan 1, 2016
Citations: 39

Similar Papers

Comparison of 2D and 3D algorithms for adding a margin to the gross tumor volume in the conformal radiotherapy planning of prostate cancer
Vincent S Khoo ... David P Dearnaley
International Journal of Radiation Oncology, Biology, Physics | VOL. 42
Vincent S Khoo, et. al.Vincent S Khoo ... David P Dearnaley
01 Oct 1998
International Journal of Radiation Oncology, Biology, Physics | VOL. 42

Three-Dimensional Weightbearing Assessment of the First Ray in Hallux Valgus: A Case-Control Study
Francois Lintz ... Alessio Bernasconi
Foot & Ankle Orthopaedics | VOL. 4
Francois Lintz, et. al.Francois Lintz ... Alessio Bernasconi
01 Oct 2019
Foot & Ankle Orthopaedics | VOL. 4

Efficient Distributed Algorithms for Convolutional Neural Networks
Rui Li ... Aravind Sukumaran-Rajam
-
Rui Li, et. al.Rui Li ... Aravind Sukumaran-Rajam
06 Jul 2021
06 Jul 2021

Accuracy of device-specific 2D and 3D image distortion correction algorithms for magnetic resonance imaging of the head provided by a manufacturer
Christian P Karger ... Angelika Höss
Physics in Medicine & Biology | VOL. 51
Christian P Karger, et. al.Christian P Karger ... Angelika Höss
06 Jun 2006
Physics in Medicine & Biology | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Matrix Multiplication: A Systematic Journey

Abstract

Talk to us

Similar Papers

More From: SIAM Journal on Scientific Computing