Abstract

Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.