Abstract

Efficient FPGA-based floating-point singular value decomposition (SVD) is challenging for its enormous complexity with the rapid growth of the matrix dimension. Numerous hardware architectures have been proposed to improve the performance of SVD by increasing capacity of computation units, reusing data, and enhancing bandwidth. These designs, however, are not optimum due to their low parallelism, poor data access efficiency, and inferior iterations scheduling. In this express, we propose a block column vector Hestenes-Jacobi (BCV Jacobi) algorithm that decomposes an arbitrary large matrix into several blocks, enhances the access efficiency by customizing the distinctive data structure, and improves the system-level parallelism by simplifying the iteration scheduling. The proposed BCV Jacobi algorithm also achieves better scalability and efficiency. Experimental results show that the performance of the proposed FPGA based SVD processor is superior to other SVD implementations in terms of parallelism, data access efficiency, matrix size, and execution time. When compared with state of the art SVD accelerator engine, the proposed algorithm speeds up the runtime over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2{\times }$ </tex-math></inline-formula> on average.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.