Abstract

In this study we analyze how to make a proper selection for the given matrix-matrix multiplication operation and to decide which is the best suitable algorithm that generates a high throughput with a minimum time, a comparison analysis and a performance evaluation for some algorithms is carried out using the identical performance parameters.

Highlights

  • Most of the parallel algorithms for matrix multiplication use matrix decomposition that is based on the number of processors available

  • Matrix multiplication can be performed in O (n) time on a mesh with wraparound connections and n×n processors (Cannon, 1969); in O time on a three dimensional mesh of trees with n3 processors (Leighton, 1992); in O time on a hypercube or shuffle exchange network with n3 processors (Dekel et al, 1983)

  • The advantage of using the linear transformation in designing the systolic array for matrix multiplication: If one uses theorem 2, it is possible to find the number of PEs in the corresponding systolic array

Read more

Summary

Introduction

Most of the parallel algorithms for matrix multiplication use matrix decomposition that is based on the number of processors available. This includes the systolic algorithm (Choi et al, 1992), Cannon’s algorithm (Alpatov et al, 1997), Fox’s and Otto’s Algorithm (Agarwal et al, 1995), PUMMA (Parallel Universal Matrix Multiplication) (Choi et al, 1994), SUMMA (Scalable Universal Matrix Multiplication) (Cannon, 1969) and DIMMA (Distribution Independent Matrix Multiplication) (Chtchelkanova et al, 1995). The standard method for n×n matrix multiplication uses O (n3) operations (multiplications). The aim is to develop highly parallel algorithms that have the cost less than O (n3)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call