Ultra-Elastic CGRAs for Irregular Loop Specialization

Christopher Torng,Peitian Pan,Cheng Tan,Christopher Batten,Yanghui Ou

doi:10.1109/hpca51647.2021.00042

Abstract

Reconfigurable accelerator fabrics, including coarse-grain reconfigurable arrays (CGRAs), have experienced a resurgence in interest because they allow fast-paced software algorithm development to continue evolving post-fabrication. CGRAs traditionally target regular workloads with data-level parallelism (e.g., neural networks, image processing), but once integrated into an SoC they remain idle and unused for irregular workloads. An emerging trend towards repurposing these idle resources raises important questions for how to efficiently map and execute general-purpose loops which may have irregular memory accesses, irregular control flow, and inter-iteration loop dependencies. Recent work has increasingly leveraged elasticity in CGRAs to mitigate the first two challenges, but elasticity alone does not address inter-iteration loop dependencies which can easily bottleneck overall performance. In this paper, we address all three challenges for irregular loop specialization and propose ultra-elastic CGRAs (UE-CGRAs), a novel elastic CGRA that accelerates true-dependency bottlenecks and saves energy in irregular loops by overcoming traditional VLSI challenges. UE-CGRAs allow configurable fine-grain dynamic voltage and frequency scaling (DVFS) for each of potentially hundreds of tiny processing elements (PEs) in the CGRA, enabling chains of connected PEs to “rest” at lower voltages and frequencies to save energy, while other chains of connected PEs can “sprint” at higher voltages and frequencies to accelerate through true-dependency bottlenecks. UE-CGRAs rely on a novel ratiochronous clocking scheme carefully overlaid on the inter-PE elastic interconnect to enable low-latency crossings while remaining fully verifiable with commercial static timing analysis tools. We present the UE-CGRA analytical model, compiler, architectural template, and VLSI circuitry, and we demonstrate how UE-CGRAs can specialize for irregular loops and improve performance ($ 1.42-1.50\times$) or energy efficiency $(1.24-2.32\times)$ with reasonable area overhead compared to traditional inelastic and elastic CGRAs, while also improving performance ($ 1.35-3.38\times$) or energy efficiency (up to $1.53\times$) compared to a RISC-V core.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ultra-Elastic CGRAs for Irregular Loop Specialization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Design-Space Exploration of Application-specific Instruction-set Processor Design
M H Sargolzaei
International Journal of Computing | VOL. -
M H SargolzaeiM H Sargolzaei
31 Dec 2021
International Journal of Computing | VOL. -

HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space Exploration
Sichao Chen ... Su Zheng
ACM Transactions on Reconfigurable Technology and Systems | VOL. 17
Sichao Chen, et. al.Sichao Chen ... Su Zheng
10 May 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. 17

Reliability-Aware Dynamic Voltage and Frequency Scaling
F Firouzi ... S Safari
-
F Firouzi, et. al.F Firouzi ... S Safari
01 Jul 2010
01 Jul 2010

Scheduler for Inhomogeneous and Irregular CGRAs with Support for Complex Control Flow
Tajas Ruschke ... Dennis Wolf
-
Tajas Ruschke, et. al.Tajas Ruschke ... Dennis Wolf
01 May 2016
01 May 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ultra-Elastic CGRAs for Irregular Loop Specialization

Abstract

Talk to us

Similar Papers