A Resource and Performance Optimization Reduction Circuit on FPGAs

Linhuai Tang,Jiamin Chen,Gang Cai,Yong Zheng

doi:10.1109/tpds.2020.3020117

Abstract

Reduce is a fundamental computing pattern, which is widely involved in scientific and engineering applications. For example, accumulation, the most common example of reduce pattern, is the core of applications such as dot product, matrix multiplication, and finite impulse response (FIR) filter. However, there is a trade-off between performance and area in the hardware implementation of the reduce pattern. To solve this problem, we propose an optimized reduction method that can handle multiple arbitrary-length sets. The performance of the proposed method is evaluated for both a single data set and numerous data sets. Moreover, to quickly differentiate the data of different sets in the reduction circuit, individual modules are designed to manage the data. We implement the design on FPGAs and present the experimental results. The proposed design with high performance and low resource consumption can achieve at least 1.59 times improvement on area-time product compared with the reported methods.

Full Text