Analytical and measured sustained bandwidth for an FPGA-based processor

Gerald R Morris,Khalid H Abed,Antoinette R Silas

doi:10.1109/secon.2012.6196914

Abstract

Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.

Full Text