An Emerging NVM CIM Accelerator With Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Zhiwang Guo,Xiaoyang Zeng,Chenyang Zhao,Deyang Chen,Xiankui Xiong,Jinbei Fang,Keji Zhou,Jingwen Jiang,Xiaoyong Xue,Qi Liu,Yixuan Liu,Haidong Tian

doi:10.1109/tcsii.2023.3240193

Abstract

Computing-in-memory (CIM) helps to improve the energy efficiency of computing by reducing data movement. In edge devices, it is necessary for CIM accelerators to support light-weighted on-chip training for adapting the model to environmental changes and ensuring edge data security. However, most of the previous CIM accelerators for edge devices only realize inference but with training performed on cloud. The support for on-chip training will lead to remarkable area cost and serious performance attenuation. In this work, a CIM accelerator based on emerging nonvolatile memory (NVM) is presented with shared-path transpose read and bit-interleaving weight storage for efficient on-chip training in edge devices. The shared-path transpose read employs a new biasing scheme to eliminate the influence of body effect on the transpose read, improving both read margin and speed. The bit-interleaving weight storage splits the multi-bit weights into individual bits which are stored in the array alternately, speeding up the calculation of training process remarkably. For 8-bit inputs and weights, the evaluation in the 28nm process shows that the proposed accelerator achieves 3.34/3.06 TOPS/W energy efficiency for feed-forward/ back-propagation, 4.6X lower computing latency, and reduces at least 20% chip size compared to the baseline design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Emerging NVM CIM Accelerator With Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems II: Express Briefs

Lead the way for us

Similar Papers

Energy Efficient In-memory Integer Multiplication Based on Racetrack Memory
Tao Luo ... Cheng Liu
-
Tao Luo, et. al.Tao Luo ... Cheng Liu
01 Nov 2020
01 Nov 2020

A 333TOPS/W Logic-Compatible Multi-Level Embedded Flash Compute-In-Memory Macro with Dual-Slope Computation
Edward Choi ... Injun Choi
-
Edward Choi, et. al.Edward Choi ... Injun Choi
01 Apr 2023
01 Apr 2023

Towards Efficient Compact Network Training on Edge-Devices
Feng Xiong ... Shouyi Yin
-
Feng Xiong, et. al.Feng Xiong ... Shouyi Yin
01 Jul 2019
01 Jul 2019

Modeling and Optimization for Self-powered Non-volatile IoT Edge Devices with Ultra-low Harvesting Power
Chen Pan ... Jingtong Hu
ACM Transactions on Cyber-Physical Systems | VOL. 3
Chen Pan, et. al.Chen Pan ... Jingtong Hu
31 Jul 2019
ACM Transactions on Cyber-Physical Systems | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Emerging NVM CIM Accelerator With Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems II: Express Briefs