Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

Maurice Peemen ,Bart Mesman ,Henk Corporaal

doi:10.5555/2755753.2755790

Abstract

The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1--3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Inter-Tile Reuse Optimization Applied to Bandwidth Constrained Embedded Accelerators
Maurice Peemen ... Henk Corporaal
-
Maurice Peemen, et. al.Maurice Peemen ... Henk Corporaal
01 Jan 2015
01 Jan 2015

Hardware implementation of principal component analysis for gas identification systems on the Zynq SoC platform
Amine Ait Si Ali
-
Amine Ait Si AliAmine Ait Si Ali
01 Jan 2013
01 Jan 2013

Study of System-on-Chip devices to implement embedded real-time simulators of modular multi-level converters using high-level synthesis tools
D Tormo ... R Blasco-Gimenez
-
D Tormo, et. al.D Tormo ... R Blasco-Gimenez
01 Feb 2018
01 Feb 2018

High-Level Synthesis of VLSI Processors for Intelligent Integrated Systems Based on Logic-in-Memory Structure
Takao Kudoh ... Michitaka Kameyama
IEEJ Transactions on Electronics, Information and Systems | VOL. 123
Takao Kudoh, et. al.Takao Kudoh ... Michitaka Kameyama
01 Jan 2003
IEEJ Transactions on Electronics, Information and Systems | VOL. 123

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

Abstract

Talk to us

Similar Papers