Abstract

This paper presents an enhanced tool flow and hardware to allow a host CPU to exploit the timing margins available on a FPGA fabric to improve its performance or reduce its energy and power requirements. Two different case studies are considered to demonstrate the performance gains and energy reduction possible in realistic scenarios. The first case study presents a video fusion system with hardware acceleration. The video fusion application is based on Dual-Tree Complex Wavelet Transforms (DT-CWT) that are mapped to a hardware accelerator using high-level synthesis tools. The hardware netlist is processed and in-situ detectors are automatically added to monitor and pre-detect timing failures occurring in the critical path flip-flops. In the second case study the tool flow is extended to support cases where the critical paths terminate in memory blocks with internal registers hidden from the user. A soft-core multiprocessor implemented in the FPGA is used to illustrate the additional challenges and proposed solution. In both cases the host CPU can control the voltage and frequency of the FPGA and compute to the performance or energy limit obtaining around 70% increase in performance or reduction in energy. Intermediate solutions that trade different levels of performance for energy are also possible. The system exhibits excellent energy proportional computing characteristics and can adapt its operating point to complete a task within a given time budget so that only the minimum level of energy is used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call