Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Wontae Kim,Hyuk-Jae Lee,Ilwi Yun,Sangheon Lee,Chulhee Lee,Kyujoong Lee

doi:10.1109/access.2022.3197206

Wontae Kim, Hyuk-Jae Lee + Show 4 more

Open Access

https://doi.org/10.1109/access.2022.3197206

Copy DOI

Abstract

Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2022
License type: CC BY 4.0

Similar Papers

Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators
Saurabh Tewari ... Anshul Kumar
-
Saurabh Tewari, et. al.Saurabh Tewari ... Anshul Kumar
01 Jul 2020
01 Jul 2020

Minimizing Off-Chip Memory Access for CNN Accelerators
Saurabh Tewari ... Kolin Paul
IEEE Consumer Electronics Magazine | VOL. 11
Saurabh Tewari, et. al.Saurabh Tewari ... Kolin Paul
01 May 2022
IEEE Consumer Electronics Magazine | VOL. 11

High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
Xiaocong Lian ... Zhenyu Liu
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 27
Xiaocong Lian, et. al.Xiaocong Lian ... Zhenyu Liu
01 Aug 2019
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 27

Modeling Energy Consumption of Memory Systems
Hirotaka Kawata ... Gaku Nakagawa
-
Hirotaka Kawata, et. al.Hirotaka Kawata ... Gaku Nakagawa
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Abstract

Talk to us

Similar Papers

More From: IEEE Access