A High-Performance FPGA Accelerator for CUR Decomposition

M.A.A Abdelgawad,Hong Yan,Ray C.C Cheung

doi:10.1109/fpl57034.2022.00052

Abstract

A matrix factorization is to decompose a matrix into a product of smaller matrices. It is widely used in machine learning algorithms. There are many matrix decomposition algorithms, and each has various applications. CUR matrix decomposition is a widely-used factorization tool that has been employed for dimension reduction and pattern recognition in many scientific and engineering applications, such as image processing, text mining, and wireless communications. In this paper we propose an efficient FPGA-based floating-point accelerator using high-level synthesis (HLS) for the CUR decomposition algorithm. Our experiment results demonstrate the better efficiency of our hardware design compared to the optimized CPU-based software solutions. The speedup of our FPGA-based architecture over the optimized software implementation ranges from 2.37 to 16.82 times for different dimensions of the data input matrix. We evaluated our design using large dimension matrices 1024 x 1024 and 2048 x 2048 and the experiment results demonstrated the efficiency of our design in terms of the utilized resources and latency. Finally, we have compared our design with other matrix decomposition algorithms such as SVD and QR decomposition, the experiment results demonstrated that CUR is more efficient than SVD and QR decomposition in terms of latency and required resources.

Full Text