A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

Gerald R Morris,Viktor K Prasanna

doi:10.1016/j.jpdc.2008.03.004

Abstract

Reconfigurable computers (RCs) combine general-purpose processors (GPPs) with field programmable gate arrays (FPGAs). The FPGAs are reconfigured at run time to become application-specific processors that collaborate with the GPPs to execute the application. High-level language (HLL) to hardware description language (HDL) compilers allow the FPGA-based kernels to be generated using HLL-based programming rather than HDL-based hardware design. Unfortunately, the loops needed for floating-point reduction operations often cannot be pipelined by these HLL–HDL compilers. This capability gap prevents the development of a number of important FPGA-based kernels. This article describes a novel architecture and algorithm that allow the use of an HLL–HDL environment to implement high-performance FPGA-based kernels that reduce multiple, variable-length sets of floating-point data. A sparse matrix iterative solver is used to demonstrate the effectiveness of the reduction kernel. The FPGA-augmented version running on a contemporary RC is up to 2.4 times faster than the software-only version of the same solver running on the GPP. Conservative estimates show the solver will run up to 6.3 times faster than software on a next-generation RC.

Full Text