Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

Shouyi Yin,Shaojun Wei,Leibo Liu,Xinhan Lin

doi:10.1109/tpds.2016.2531678

Abstract

Coarse-grained reconfigurable architecture (CGRA) is a promising parallel computing platform that provides high performance, high power efficiency and flexibility. However, for imperfect nested loops, the existing loop mapping methods often result in low execution performance and poor hardware utilization. To tackle this problem, this paper makes three contributions: 1) a highly effective and general approach to map imperfect loops on CGRA; 2) a global optimization strategy to search the optimal initiation intervals (IIs); 3) a powerful kernel compression method to reduce the oversized kernel. Experiment results show that our approach can reduce the total computing latency by 20.5, 58.5 and 73.2 percent compared to the state-of-the-art approaches on $2 \times 2$ , $4 \times 4$ and $8 \times 8$ CGRA respectively. Moreover, the compilation time and configuration context size is acceptable in practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Nov 1, 2016
Citations: 38

Similar Papers

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures
Shouyi Yin ... Leibo Liu
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 24
Shouyi Yin, et. al. Shouyi Yin ... Leibo Liu
01 Feb 2016
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 24

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs
Shouyi Yin ... Shaojun Wei
-
Shouyi Yin, et. al.Shouyi Yin ... Shaojun Wei
01 Jan 2015
01 Jan 2015

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
...
-
, et. al. ...
09 Mar 2015
09 Mar 2015

Resource-saving compile flow for coarse-grained reconfigurable architectures
Zhongyuan Zhao ... Zhigang Mao
-
Zhongyuan Zhao, et. al.Zhongyuan Zhao ... Zhigang Mao
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems