N latency 2N I/O‐bandwidth 2D‐array matrix multiplication algorithm

A.K Oudjida,S Titr,M Hamarlain

doi:10.1108/03321640210423298

Abstract

The emergence of the systolic paradigm in 1978 inspired the first 2D‐array parallelization of the sequential matrix multiplication algorithm. Since then, and due to its attractive and appealing features, systolic approach has been gaining great momentum to the point where all 2D‐array parallelization attempts were exclusively systolic. As good result, latency has been successively reduced a number of times (5N, 3N, 2N, 3N/2), where N is the matrix size. But as latency was getting lower, further irregularities were introduced into the array, making the implementation severely compromised either at VLSI level or at system level. The best illustrative case of such irregularities are the two designs proposed by Tsay and Chang in 1995 and considered as the fastest designs (3N/2) that have been developed so far. The purpose of this paper is twofold: we first demonstrate that N+√N/2 is the minimal latency that can be achieved using the systolic approach. Afterwards, we introduce a full‐parallel 2D‐array algorithm with N latency and 2N I/O‐bandwidth. This novel algorithm is not only the fastest algorithm, but is also the most regular one too. A 3D parallel version with O(log N) latency is also presented.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

N latency 2N I/O‐bandwidth 2D‐array matrix multiplication algorithm

Abstract

Talk to us

Similar Papers

More From: COMPEL - The international journal for computation and mathematics in electrical and electronic engineering

Lead the way for us

Journal: COMPEL - The international journal for computation and mathematics in electrical and electronic engineering	Publication Date: Sep 1, 2002
Citations: 6

Similar Papers

N latency 2N I/O-bandwidth 2D-array matrix multiplication algorithm
A.K Oudjida ... S Titri
-
A.K Oudjida, et. al.A.K Oudjida ... S Titri
02 Sep 2001
02 Sep 2001

Square Matrix Multiplication Using CUDA on GP-GU
Ali Olow Jimale ... Wan Mohd Nazmee Wan Zainon
Procedia Computer Science | VOL. 161
Ali Olow Jimale, et. al.Ali Olow Jimale ... Wan Mohd Nazmee Wan Zainon
01 Jan 2019
Procedia Computer Science | VOL. 161

An Innovative Fast Algorithm and Structure Design for Analysis and Synthesis Quadrature Mirror Filterbanks on the SBR in DRM
Shin-Chi Lai ... Sheau-Fang Lei
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 60
Shin-Chi Lai, et. al.Shin-Chi Lai ... Sheau-Fang Lei
01 Nov 2013
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 60

A fast codebook design algorithm for vector quantization
...
-
, et. al. ...
21 Aug 2000
21 Aug 2000

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

N latency 2N I/O‐bandwidth 2D‐array matrix multiplication algorithm

Abstract

Talk to us

Similar Papers

More From: COMPEL - The international journal for computation and mathematics in electrical and electronic engineering