MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks

Syuan-Hao Sie,Kea-Tiong Tang,Chih-Cheng Lu,Zhaofang Li,Chih-Cheng Hsieh,Meng-Fan Chang,Zuo-Wei Yeh,Jye-Luen Lee,Yi-Ren Chen

doi:10.1109/tcad.2021.3082107

Abstract

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros, such as the number of word lines and bit lines that can be turned on at the same time, as well as how to map the weight to the SRAM CIM macro. In this study, a software and hardware co-design approach is proposed to design an SRAM CIM-based CNN accelerator and an SRAM CIM-aware model compression algorithm. To lessen the high-precision MAC required by batch normalization (BN), a quantization algorithm that can fuse BN into the weights is proposed. Furthermore, to reduce the number of network parameters, a sparsity algorithm that considers a CIM architecture is proposed. Last, MARS, a CIM-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparsity neural network, is proposed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Oct 24, 2020
Citations: 34

Similar Papers

High-Speed SRAM with Flexible Read/Write Data Width Tailored for Convolutional Neural Network
Xiaowei Chen ... David J Bogacz
-
Xiaowei Chen, et. al.Xiaowei Chen ... David J Bogacz
01 May 2020
01 May 2020

Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects
Shimeng Yu ... Shanshi Huang
IEEE Circuits and Systems Magazine | VOL. 21
Shimeng Yu, et. al.Shimeng Yu ... Shanshi Huang
01 Jan 2020
IEEE Circuits and Systems Magazine | VOL. 21

Approach to Improve the Performance Using Bit-level Sparsity in Neural Networks
Yesung Kang ... Eunji Kwon
-
Yesung Kang, et. al.Yesung Kang ... Eunji Kwon
01 Feb 2021
01 Feb 2021

Spin Orbit Torque-based Crossbar Array for Error Resilient Binary Convolutional Neural Network
Kamal Danouchi ... Guillaume Prenat
-
Kamal Danouchi, et. al.Kamal Danouchi ... Guillaume Prenat
05 Sep 2022
05 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems