Accelerating Nested Conditionals on CGRA With Tag-Based Full Predication Method

Jiang Sha,Yingying Zhao,Yu Gong,Wenbo Song

doi:10.1109/access.2020.3001220

Jiang Sha, Yingying Zhao + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3001220

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

Affiliation: Southeast University

Abstract

CGRA (Coarse-grained Reconfigurable Architecture) has been widely considered as one of the most promising computing architectures to exploit spatial parallelism. Compared with the typical general-purpose architectures which are instruction-driven, most of the state-of-art CGRAs are designed following the data-driven strategy, leading to difficulties while dealing with control-flow (nested if-then-else structures, NITE). Tackling with this problem, existing techniques such as partial predication and full predication introduce extra conditional move and select operations, while state-based full predication (SFP) introduces sleep and awake operations to correctly implement the basic function of NITE. Meanwhile, performance degradation is also incurred by these redundant operations. In this paper, a novel tag-based full predication (TFP) strategy is proposed, trying to eliminate redundant operations and thus accelerate NITE on CGRAs. The extra tag field is added to each instruction word to implement distributed nullification and parallel tag register (TReg) overwriting. Hardware support for TFP is present, and experimental validation is based on RTL-level simulation with manual mapping. Results show that our method achieves over 30% performance gain on average compared with SFP at the expense of around 5% additional power consumption and ignorable area overhead.

Highlights

CGRAs have been considered as one of the most promising architectures, which fill the gap between the General-Purpose Processors (GPPs) and the Application Specific Integrated Circuits (ASICs)
The design is written in Verilog, and RTL simulation is conducted with Vivado suite on Xilinx Aritx-7 Field Programmable Gate Arrays (FPGAs) to evaluate the performance, synthesized using Design Compiler (DC) in TSMC 40nm technology to evaluate power and area
The binary instructions are generated in a semi-automatic way, which is obtained by replacing the assembly codes with a script

Summary

INTRODUCTION

As have been considered as one of the most promising architectures, which fill the gap between the General-Purpose Processors (GPPs) and the Application Specific Integrated Circuits (ASICs). Classic GPPs maximize flexibility with purely instruction-driven execution mechanism, while ASICs are commonly data-driven, maximizing performance and power efficiency. According to the famous Amdahl’s law [10], when the acceleration of compute-intensive parts in a loop has reached a relatively extreme extend, the performance bottleneck will be the control-intensive part. In this case, NITE is the performance bottleneck for CGRAs considering minor acceleration. Tags directly transfer predication information and remove all the extra operations in SFP and PP, resulting in better performance.

BACKGROUND

COMPARISON OF POWER CONSUMPTION AND AREA OVERHEAD

CONCLUSION