Abstract

Modern processors contain several specific hardware modules and multiple cores to ensure performance for a wide range of applications. In this context, FPGAs are frequently used as the implementation platform, since they offer architecture customization and fast time-to-market. However, many of them may not have the needed resources to implement all the necessary features, because of costs or complexity of the system to be implemented. When some needed functionalities do not fit in the target, they must be mapped into the much slower software domain. In this work, we exploit the fact these designs usually underuse their available BRAMs and propose a low-cost hardware-based function reuse mechanism for FPGAs, recovering some of the performance lost from the software part of applications that could not be implemented in hardware logic, with minimal impact on LUT usage. This is achieved by saving the inputs and outputs of the most frequently executed functions in a BRAM-based reuse table, so the next function executions with the same arguments can be skipped. This mechanism supports both precise and approximate modes and is evaluated with a 4-issue VLIW processor implemented in HDL, also considering a multi-core environment. Precise reuse, in single and multi-core scenarios, is assessed by running applications that use a software library to emulate floating point operations. Approximate reuse is evaluated over a single-core image-processing application that tolerates a certain level of error. Our scheme achieves 1.39 × geomean speedup in the precise single-core, while the multi-core case demonstrates application improvements from 1.25 × to 1.9 × when we start sharing the reuse table. In the approximate scenario, we achieve 1.52 × speedup with less than 10% error.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call