Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

Chen Yang,Shaojun Wei,Leibo Liu,Shouyi Yin

doi:10.1007/s11433-014-5610-2

Abstract

The computational capability of a coarse-grained reconfigurable array (CGRA) can be significantly restrained due to data and context memory bandwidth bottlenecks. Traditionally, two methods have been used to resolve this problem. One method loads the context into the CGRA at run time. This method occupies very small on-chip memory but induces very large latency, which leads to low computational efficiency. The other method adopts a multi-context structure. This method loads the context into the on-chip context memory at the boot phase. Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis. The size of the context memory induces a large area overhead in multi-context structures, which results in major restrictions on application complexity. This paper proposes a Predictable Context Cache (PCC) architecture to address the above context issues by buffering the context inside a CGRA. In this architecture, context is dynamically transferred into the CGRA. Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory. Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue. Rather than fundamentally reducing the amount of input data, the transferred data and computations are processed in parallel. However, the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases. This paper also presents a Hierarchical Data Memory (HDM) architecture as a solution to the efficiency problem. In this architecture, high internal bandwidth is provided to buffer both reused input data and intermediate data. The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved. As a result of using PCC and HDM, experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48% when there was a reasonable memory size. Therefore, 1080p@35.7fps for H.264 high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency. Further, the size of the on-chip context memory no longer restricted complex applications, which were efficiently executed on the PCC and HDM architecture.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

Abstract

Talk to us

Similar Papers

More From: Science China Physics, Mechanics & Astronomy

Lead the way for us

Journal: Science China Physics, Mechanics & Astronomy	Publication Date: Oct 21, 2014
Citations: 2

Similar Papers

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only)
Chen Yang ... Shaojun Wei
-
Chen Yang, et. al.Chen Yang ... Shaojun Wei
22 Feb 2015
22 Feb 2015

Context-memory Aware Mapping for Energy Efficient Acceleration with CGRAs
Satyajit Das ... Kevin J M Martin
-
Satyajit Das, et. al.Satyajit Das ... Kevin J M Martin
01 Mar 2019
01 Mar 2019

The research of interconnection network on coarse-grained reconfigurable Cipher Logic Array
Yuanming Li ... Yinjian Yan
-
Yuanming Li, et. al.Yuanming Li ... Yinjian Yan
01 Mar 2017
01 Mar 2017

Scheduler for Inhomogeneous and Irregular CGRAs with Support for Complex Control Flow
Tajas Ruschke ... Dennis Wolf
-
Tajas Ruschke, et. al.Tajas Ruschke ... Dennis Wolf
01 May 2016
01 May 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

Abstract

Talk to us

Similar Papers

More From: Science China Physics, Mechanics & Astronomy