A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators

Ying Wang,Xiaowei Li,Huawei Li

doi:10.1109/tcad.2017.2778060

Abstract

The rapid development of machine learning is enabling a plenty of novel applications, such as image and speech recognition for embedded and mobile devices. However, state-of-the-art deep learning models like convolutional neural networks (CNNs) are demanding so much on-chip storage and compute resources that they cannot be smoothly handled by low-power mobile or embedded systems. In order to fit large CNN models into mobile or more cutting-edge devices for IoT or cyberphysics applications, we proposed an efficient on-chip memory architecture for CNN inference acceleration, and showed its application to in-house single-instruction multiple-data structure machine learning processor. The redesigned on-chip memory subsystem, Memsqueezer, includes an active weight buffer and data buffer set that embraces specialized compression methods to reduce the footprint of CNN parameters (weights) and activation data, respectively. Memsqueezer buffer can compress the data and weight set according to the dataflow in computation, and it also includes a built-in redundancy detection mechanism that actively scans through the working-set of CNNs to boost their inference performance by eliminating the computation redundancy in CNN models. In our experiments, it is shown that the CNN processors with Memsqueezer buffers achieve more than $2{\times }$ performance improvement and reduces 85% energy consumption on average over the conventional buffer design with the same area budget.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Oct 1, 2018
Citations: 12

Similar Papers

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
Ying Wang ... Xiaowei Li
-
Ying Wang, et. al.Ying Wang ... Xiaowei Li
07 Nov 2016
07 Nov 2016

Development of hybrid models based on deep learning and optimized machine learning algorithms for brain tumor Multi-Classification
Muhammed Celik ... Ozkan Inik
Expert Systems with Applications | VOL. 238
Muhammed Celik, et. al.Muhammed Celik ... Ozkan Inik
18 Oct 2023
Expert Systems with Applications | VOL. 238

Joint Task Offloading, CNN Layer Scheduling and Resource Allocation in Cooperative Computing System
Xia Song ... Rong Chai
-
Xia Song, et. al.Xia Song ... Rong Chai
01 Jan 2020
01 Jan 2020

Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model
Rahim Barzegar ... Jan Adamowski
Stochastic Environmental Research and Risk Assessment | VOL. 34
Rahim Barzegar, et. al.Rahim Barzegar ... Jan Adamowski
01 Feb 2020
Stochastic Environmental Research and Risk Assessment | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems