Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Samuel Bayliss,George A Constantinides

doi:10.1145/2145694.2145727

Abstract

Memory bandwidth is critical to achieving high performance in many FPGA applications. The bandwidth of SDRAM memories is, however, highly dependent upon the order in which addresses are presented on the SDRAM interface. We present an automated tool for constructing an application specific on-chip memory address sequencer which presents requests to the external memory with an ordering that optimizes off-chip memory bandwidth for fixed on-chip memory resource. Within a class of algorithms described by affine loop nests, this approach can be shown to reduce both the number of requests made to external memory and the overhead associated with those requests. Data presented shows a trade off between the use of on-chip resources and achievable off-chip memory bandwidth where a range of improvements from 3.6x to 4x gain in efficiency on the external memory interface can be gained at a cost of up to a 1.4x increase in the ALUTs dedicated to address generation circuits in an Altera Stratix III device.

Full Text