Abstract

Compute-bound problems like matrix-matrix multiplication can be accelerated using special purpose hardware scheme such as Systolic Arrays (SAs). However, processing elements in SAs have a long critical path delay, thus limiting the performance benefits of SAs. This paper presents a scheme to achieve high-performance matrix multiplication using SAs. Two approximate matrix multiplier designs (Ax1 and Ax2) of variable accuracy/power are proposed. The proposed designs (8-bit) achieve an improvement of 32% in terms of critical path delay and for scale-up variants (32-bit) the improvement in delay and energy scale upto 64% and 51%, respectively. Moreover, Ax1 and Ax2 have a reduced power-delay product compared to previous approximate matrix multiplier designs. This leads to an improved resolution of the prior accuracy-energy Pareto front; therefore, we define a new Pareto front for approximate matrix multipliers. As a case study, the discrete cosine transform is evaluated. Ax2 achieves the best quality-power trade-off and it exhibits a 5% degradation in structural similarity index (SSIM) with a power saving of 28%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.