Abstract

CGRA (Coarse-grained Reconfigurable Architecture) has been widely considered as one of the most promising computing architectures to exploit spatial parallelism. Compared with the typical general-purpose architectures which are instruction-driven, most of the state-of-art CGRAs are designed following the data-driven strategy, leading to difficulties while dealing with control-flow (nested if-then-else structures, NITE). Tackling with this problem, existing techniques such as partial predication and full predication introduce extra conditional move and select operations, while state-based full predication (SFP) introduces sleep and awake operations to correctly implement the basic function of NITE. Meanwhile, performance degradation is also incurred by these redundant operations. In this paper, a novel tag-based full predication (TFP) strategy is proposed, trying to eliminate redundant operations and thus accelerate NITE on CGRAs. The extra tag field is added to each instruction word to implement distributed nullification and parallel tag register (TReg) overwriting. Hardware support for TFP is present, and experimental validation is based on RTL-level simulation with manual mapping. Results show that our method achieves over 30% performance gain on average compared with SFP at the expense of around 5% additional power consumption and ignorable area overhead.

Highlights

  • CGRAs have been considered as one of the most promising architectures, which fill the gap between the General-Purpose Processors (GPPs) and the Application Specific Integrated Circuits (ASICs)

  • The design is written in Verilog, and RTL simulation is conducted with Vivado suite on Xilinx Aritx-7 Field Programmable Gate Arrays (FPGAs) to evaluate the performance, synthesized using Design Compiler (DC) in TSMC 40nm technology to evaluate power and area

  • The binary instructions are generated in a semi-automatic way, which is obtained by replacing the assembly codes with a script

Read more

Summary

INTRODUCTION

As have been considered as one of the most promising architectures, which fill the gap between the General-Purpose Processors (GPPs) and the Application Specific Integrated Circuits (ASICs). Classic GPPs maximize flexibility with purely instruction-driven execution mechanism, while ASICs are commonly data-driven, maximizing performance and power efficiency. According to the famous Amdahl’s law [10], when the acceleration of compute-intensive parts in a loop has reached a relatively extreme extend, the performance bottleneck will be the control-intensive part. In this case, NITE is the performance bottleneck for CGRAs considering minor acceleration. Tags directly transfer predication information and remove all the extra operations in SFP and PP, resulting in better performance.

BACKGROUND
COMPARISON OF POWER CONSUMPTION AND AREA OVERHEAD
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call