Sparse Matrix-Vector Multiplication (SpMV) is an essential operation in scientific and engineering fields, with applications in areas like finite element analysis, image processing, and machine learning. To address the need for faster and more energy-efficient computing, this paper investigates the acceleration of SpMV through Field-Programmable Gate Arrays (FPGAs), leveraging High-Level Synthesis (HLS) for design simplicity. Our study focuses on the AMD-Xilinx Alveo U280 FPGA, assessing the performance of the SpMV kernel from Vitis Libraries, which is the state of the art on SpMV acceleration on FPGAs. We explore kernel modifications, transition to single precision, and varying partition sizes, demonstrating the impact of these changes on execution time. Furthermore, we investigate matrix preprocessing techniques, including Reverse Cuthill-McKee (RCM) reordering and a hybrid sparse storage format, to enhance efficiency. Our findings reveal that the performance of FPGA-accelerated SpMV is influenced by matrix characteristics, by smaller partition sizes, and by specific preprocessing techniques delivering notable performance improvements. By selecting the best results from these experiments, we achieved execution time enhancements of up to 3.2×. This study advances the understanding of FPGA-accelerated SpMV, providing insights into key factors that impact performance and potential avenues for further improvement.
Read full abstract