Abstract
In this study we analyze how to make a proper selection for the given matrix-matrix multiplication operation and to decide which is the best suitable algorithm that generates a high throughput with a minimum time, a comparison analysis and a performance evaluation for some algorithms is carried out using the identical performance parameters.
Highlights
Most of the parallel algorithms for matrix multiplication use matrix decomposition that is based on the number of processors available
Matrix multiplication can be performed in O (n) time on a mesh with wraparound connections and n×n processors (Cannon, 1969); in O time on a three dimensional mesh of trees with n3 processors (Leighton, 1992); in O time on a hypercube or shuffle exchange network with n3 processors (Dekel et al, 1983)
The advantage of using the linear transformation in designing the systolic array for matrix multiplication: If one uses theorem 2, it is possible to find the number of PEs in the corresponding systolic array
Summary
Most of the parallel algorithms for matrix multiplication use matrix decomposition that is based on the number of processors available. This includes the systolic algorithm (Choi et al, 1992), Cannon’s algorithm (Alpatov et al, 1997), Fox’s and Otto’s Algorithm (Agarwal et al, 1995), PUMMA (Parallel Universal Matrix Multiplication) (Choi et al, 1994), SUMMA (Scalable Universal Matrix Multiplication) (Cannon, 1969) and DIMMA (Distribution Independent Matrix Multiplication) (Chtchelkanova et al, 1995). The standard method for n×n matrix multiplication uses O (n3) operations (multiplications). The aim is to develop highly parallel algorithms that have the cost less than O (n3)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have