Abstract

Field Programmable Gate Arrays (FPGAs) are now widely adopted as hardware accelerators due to their inherent parallel processing capability. However, the sub-optimal logic utilization and large reconfiguration latency in conventional single-context FPGAs pose constraints on their usage for applications like adaptive control systems in vehicles, software defined radio where frequent context switching or resource sharing between tasks are required. As such, multi-context FPGAs with dynamic reconfiguration capability have been introduced with the aim to allow rapid reconfiguration of the FPGA, and hence increase the effective logic density. The current generation of multi-context FPGAs typically use a dynamic reconfigurable architecture based on static Random Access Memory (RAM) to implement multiple configuration planes that enable fast switching between contexts. The main challenge of these types of multi-context FPGAs are limited on-chip storage and relatively long reconfiguration latency (of the order of milliseconds). With technology driving down to nano scale, new generations of hybrid multi-context FPGA architectures, such as the CMOS/NAnoTechnology reconfigURablE architecture (NATURE) that use on-chip nano RAMs to store multiple configurations to enable extremely fast runtime reconfiguration (of the order of pico seconds) have been developed. This type of FPGA enables cycle-by-cycle reconfiguration and temporal logic folding resulting in improved logic density and area-delay product by more than an order of magnitude compared to traditional FPGAs. However, the fine granularity of this type of architecture limits its usage as a high performance hardware accelerator that implements compute intensive arithmetic operations. This research work explores and presents how DSP architectures can be designed for the hybrid multi-context NATURE platform in order to fully exploit its advantages and possibilities. The performance of various compute intensive signal processing kernels are used in the study to benchmark the improvements achieved by the proposed DSP architectures. A full-block dynamically reconfigurable DSP architecture is first presented, which can be reconfigured at runtime to implement different arithmetic functions in different clock cycles. To fully exploit the capability of temporal logic folding techniques in NATURE, the DSP block is then extended to support pipeline level reconfiguration that allows independent reconfiguration of individual pipeline stages. To enable efficient implementation of mixed-precision applications, the capability to dynamically fracture the internal compute-path of the DSP block is also incorporated into the design. The design automation tool for the NATURE platform is extended to enable efficient mapping of compute intensive kernels utilizing the proposed DSP architecture(s) by exploring optimum resource sharing and area/power reduction. A design space exploration algorithm is developed and…

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.