Abstract

Efficient FPGA-based floating-point singular value decomposition (SVD) is challenging for its enormous complexity with the rapid growth of the matrix dimension. Numerous hardware architectures have been proposed to improve the performance of SVD by increasing capacity of computation units, reusing data, and enhancing bandwidth. These designs, however, are not optimum due to their low parallelism, poor data access efficiency, and inferior iterations scheduling. In this express, we propose a block column vector Hestenes-Jacobi (BCV Jacobi) algorithm that decomposes an arbitrary large matrix into several blocks, enhances the access efficiency by customizing the distinctive data structure, and improves the system-level parallelism by simplifying the iteration scheduling. The proposed BCV Jacobi algorithm also achieves better scalability and efficiency. Experimental results show that the performance of the proposed FPGA based SVD processor is superior to other SVD implementations in terms of parallelism, data access efficiency, matrix size, and execution time. When compared with state of the art SVD accelerator engine, the proposed algorithm speeds up the runtime over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2{\times }$ </tex-math></inline-formula> on average.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call