High performance and memory efficient implementation of matrix multiplication on FPGAs

Guiming Wu,Miao Wang,Yong Dou

doi:10.1109/fpt.2010.5681769

High performance and memory efficient implementation of matrix multiplication on FPGAs

Guiming Wu, Miao Wang + Show 1 more

https://doi.org/10.1109/fpt.2010.5681769

Copy DOI

Publication Date: Dec 1, 2010

Citations: 19

Affiliation: National University of Defense Technology, Institute of Computing Technology

#Implementation Of Matrix Multiplication #FPGA Devices + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.