Abstract

The authors describe a family of linear systolic arrays for matrix multiplication exhibiting a tradeoff between local storage and the number of processing elements (PEs). The design consists of processors hooked into a linear array with each processor having storage s, 1<or=s<or=n, for n*n matrix multiplication, where the number of processors equals n times the least integer >or=n/s. The input matrices are fed as two speed data streams using fast and slow channels to satisfy the dependencies in the usual matrix multiplication algorithm. While a family of linear arrays have been synthesized for this problem, this technique leads to simpler designs with fewer number of processors and improved delay from input to output. All these designs use the optimal number of processors for local storage in the range 1<or=s<or=n. The data flow is unidirectional, which makes the designs implementable on fault wafer scale integration models.<<ETX>>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call