Abstract

Many FPGA implementations for QR decomposition have been studied on small-scale matrix and all of them are presented individually. However to the best of our knowledge, there is no FPGA-based accelerator for large-scale QR decomposition. In this paper, we propose a unified FPGA accelerator structure for large-scale QR decomposition. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for QR decomposition. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 15 PEs can integrated into an Altera StratixII EP2S130F1020C5 on our self-designed board. Experimental results show that a factor of 4 speedup and the maximum powerperformance of 60.9 can be achieved compare to Pentium Dual CPU with double SSE thread.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call