Energy-Efficient Accelerator Design With Tile-Based Row-Independent Compressed Memory for Sparse Compressed Convolutional Neural Networks

Po-Tsang Huang,Chin-Yang Lo,Wei Hwang,I-Chen Wu

doi:10.1109/ojcas.2020.3041685

Po-Tsang Huang, Chin-Yang Lo + Show 2 more

Open Access

https://doi.org/10.1109/ojcas.2020.3041685

Copy DOI

Abstract

Deep convolutional neural networks (CNNs) are difficult to be fully deployed to edge devices because of both memory-intensive and computation-intensive workloads. The energy efficiency of CNNs is dominated by convolution computation and off-chip memory (DRAM) accesses, especially for DRAM accesses. In this article, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, a tile-based row-independent compression (TRC) method with relative indexing memory is adopted for storing none-zero terms. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row-to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves $1.79\times$ speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16.

Highlights

O VER the past few years, convolutional neural networks (CNNs) are used to deal with many critical machine learning problems and gaining popularity in numerous computer-vision applications [1]–[3]
The proposed architecture can support the speedup of network models with the branching structure, such as ResNet or MobileNet. For both ResNet and MobilenNet, the residual layers can be performed in each PE locally with the larger input buffer size to accommodate the activations of the previous layer for eliminating extra DRAM accesses, and to execute the sparse computation using the TSC method with relative indexing memory
Deep CNNs are rapidly rising in popularity across a broad range of applications

Summary

INTRODUCTION

O VER the past few years, convolutional neural networks (CNNs) are used to deal with many critical machine learning problems and gaining popularity in numerous computer-vision applications [1]–[3]. Convolutional (CONV) layers aim to perform feature extraction of input dataset by computing the output of neurons which are connected to local regions through convolution and non-linear activation functions. Several methods have been proposed to reduce execution time of CONV layers by skipping zero-operands based on zero activations [11], [12], zero weights [13] or both [13], [14]. HUANG et al.: ENERGY-EFFICIENT ACCELERATOR DESIGN WITH TRC MEMORY FOR SPARSE COMPRESSED CNN of data, including weights and fmaps, for large network models. An energy-efficient accelerator architecture is proposed for sparse CNNs to eliminate redundant computation and to reduce on-chip SRAM. 1) Tile-based row-independent compression (TRC) method is realized with relative indexing memory to store non-zero (NZ) activations/weights for reducing the total amount of DRAM accesses and memory sizes. The routing complexity of all-row-to-all-row element-wise multiplications is reduced compared to all-to-all element-wise multiplications

DEEP COMPRESSION FOR SPARSE CNN

PE ARRAY FOR CONVOLUTION

IMPLEMENTATION AND SIMULATION RESULTS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Open Journal of Circuits and Systems	Publication Date: Jan 1, 2021
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Energy-Efficient Accelerator Design With Tile-Based Row-Independent Compressed Memory for Sparse Compressed Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Circuits and Systems

Lead the way for us

Similar Papers

An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network
I-Chen Wu ... Po-Tsang Huang
-
I-Chen Wu, et. al.I-Chen Wu ... Po-Tsang Huang
01 Mar 2019
01 Mar 2019

Prediction of internal and external flow with sparse convolution neural network: A computationally effective reduced-order model
Jiang-Zhou Peng ... Zhi-Hua Chen
Physics of Fluids | VOL. 35
Jiang-Zhou Peng, et. al.Jiang-Zhou Peng ... Zhi-Hua Chen
01 Feb 2023
Physics of Fluids | VOL. 35

DASC: A DRAM Data Mapping Methodology for Sparse Convolutional Neural Networks
Bo-Cheng Lai ... Hung-Ming Chen
-
Bo-Cheng Lai, et. al.Bo-Cheng Lai ... Hung-Ming Chen
14 Mar 2022
14 Mar 2022

SparTen
Ashish Gondimalla ... Noah Chesnut
-
Ashish Gondimalla, et. al.Ashish Gondimalla ... Noah Chesnut
12 Oct 2019
12 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Energy-Efficient Accelerator Design With Tile-Based Row-Independent Compressed Memory for Sparse Compressed Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Circuits and Systems