Abstract

Deep convolutional neural networks (CNNs) are difficult to be fully deployed to edge devices because of both memory-intensive and computation-intensive workloads. The energy efficiency of CNNs is dominated by convolution computation and off-chip memory (DRAM) accesses, especially for DRAM accesses. In this article, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, a tile-based row-independent compression (TRC) method with relative indexing memory is adopted for storing none-zero terms. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row-to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves $1.79\times$ speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16.

Highlights

  • O VER the past few years, convolutional neural networks (CNNs) are used to deal with many critical machine learning problems and gaining popularity in numerous computer-vision applications [1]–[3]

  • The proposed architecture can support the speedup of network models with the branching structure, such as ResNet or MobileNet. For both ResNet and MobilenNet, the residual layers can be performed in each PE locally with the larger input buffer size to accommodate the activations of the previous layer for eliminating extra DRAM accesses, and to execute the sparse computation using the TSC method with relative indexing memory

  • Deep CNNs are rapidly rising in popularity across a broad range of applications

Read more

Summary

INTRODUCTION

O VER the past few years, convolutional neural networks (CNNs) are used to deal with many critical machine learning problems and gaining popularity in numerous computer-vision applications [1]–[3]. Convolutional (CONV) layers aim to perform feature extraction of input dataset by computing the output of neurons which are connected to local regions through convolution and non-linear activation functions. Several methods have been proposed to reduce execution time of CONV layers by skipping zero-operands based on zero activations [11], [12], zero weights [13] or both [13], [14]. HUANG et al.: ENERGY-EFFICIENT ACCELERATOR DESIGN WITH TRC MEMORY FOR SPARSE COMPRESSED CNN of data, including weights and fmaps, for large network models. An energy-efficient accelerator architecture is proposed for sparse CNNs to eliminate redundant computation and to reduce on-chip SRAM. 1) Tile-based row-independent compression (TRC) method is realized with relative indexing memory to store non-zero (NZ) activations/weights for reducing the total amount of DRAM accesses and memory sizes. The routing complexity of all-row-to-all-row element-wise multiplications is reduced compared to all-to-all element-wise multiplications

DEEP COMPRESSION FOR SPARSE CNN
PE ARRAY FOR CONVOLUTION
IMPLEMENTATION AND SIMULATION RESULTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call