Abstract

High-performance sparse matrix multipliers are essential for deep learning applications, and as big data analytics continues to evolve, specialized accelerators are also needed to efficiently handle sparse matrix operations. This paper proposes a modified, configurable, outer product based architecture for sparse matrix multiplication, and explores design space of the proposed architecture. The performance of various architecture configurations has been examined for input samples with similar characteristics. The proposed architecture has been implemented on Xilinx Kintex-7 FPGA using 32-bit single precision floating-point arithmetic and also in 8-bit, 16-bit and 32-bit fixed-point arithmetic formats. The effect of quantization in the proposed architecture has been analyzed extensively and the results have been reported. The performance of the proposed architecture has been compared with state-of-the-art implementations, and an improvement of 9.21% has been observed in the performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call