Abstract

Reduce is a fundamental computing pattern, which is widely involved in scientific and engineering applications. For example, accumulation, the most common example of reduce pattern, is the core of applications such as dot product, matrix multiplication, and finite impulse response (FIR) filter. However, there is a trade-off between performance and area in the hardware implementation of the reduce pattern. To solve this problem, we propose an optimized reduction method that can handle multiple arbitrary-length sets. The performance of the proposed method is evaluated for both a single data set and numerous data sets. Moreover, to quickly differentiate the data of different sets in the reduction circuit, individual modules are designed to manage the data. We implement the design on FPGAs and present the experimental results. The proposed design with high performance and low resource consumption can achieve at least 1.59 times improvement on area-time product compared with the reported methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.