T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

Jaehoon Heo,Junsoo Kim,Joo-Young Kim,Wontak Han,Sukbin Lim

doi:10.1109/jssc.2022.3220195

Abstract

Recently, on-device training has become crucial for the success of edge intelligence. However, frequent data movement between computing units and memory during training has been a major problem for battery-powered edge devices. Processing-in-memory (PIM) is a novel computing paradigm that merges computing logic into memory, which can address the data movement problem with excellent power efficiency. However, previous PIM accelerators cannot support the entire training process on chip due to its computing complexity. This article presents a PIM accelerator for end-to-end on-device training (T-PIM), the first PIM realization that enables end-to-end on-device training as well as high-speed inference. Its full-custom PIM macro contains 8T-SRAM cells to perform the energy-efficient in-cell AND operation and the bit-serial-based computation logic enables fully variable bit-precision for input data. The macro supports various data mapping methods and computational paths for both fully connected and convolutional layers, in order to handle the complex training process. An efficient tiling scheme is also proposed to enable T-PIM to compute any size of deep neural network with the implemented hardware. In addition, configurable arithmetic units in a forward propagation path make T-PIM handle power-of-two bit-precision for weight data, enabling a significant performance boost during inference. Finally, T-PIM efficiently handles sparsity in both operands by skipping the computation of zeros in the input data and by gating-off computing units when the weight data are zero. Finally, we fabricate the T-PIM chip in 28-nm CMOS technology, occupying a die area of 5.04 mm2, including five T-PIM cores. It dissipates 5.25–51.23 mW at 50–280 MHz operating frequency with 0.75–1.05-V supply voltage. We successfully demonstrate that T-PIM can run the end-to-end training of VGG16 model on the CIFAR10 and CIFAR100 datasets, achieving 0.13–161.08- and 0.25–7.59-TOPS/W power efficiency during inference and training, respectively. The result shows that T-PIM is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.02\times $ </tex-math></inline-formula> more energy-efficient than the state-of-the-art PIM chip that only supports backward propagation, not a whole training. Furthermore, we conduct an architectural experiment using a cycle-level simulator based on actual measurement results, which suggests that the T-PIM architecture is scalable and its scaled-up version provides up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$203.26\times $ </tex-math></inline-formula> higher power efficiency than a comparable GPU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Solid-State Circuits

Lead the way for us

Journal: IEEE Journal of Solid-State Circuits	Publication Date: Mar 1, 2023
Citations: 14

Similar Papers

Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training
Wontak Han ... Junsoo Kim
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 12
Wontak Han, et. al.Wontak Han ... Junsoo Kim
01 Jun 2022
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 12

T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training
Jaehoon Heo ... Wontak Han
-
Jaehoon Heo, et. al.Jaehoon Heo ... Wontak Han
01 Apr 2022
T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training
Jaehoon Heo ... Wontak Han

FloatPIM
Mohsen Imani ... Yeseong Kim
-
Mohsen Imani, et. al.Mohsen Imani ... Yeseong Kim
22 Jun 2019
22 Jun 2019

POLAR: Performance-aware On-device Learning Capable Programmable Processing-in-Memory Architecture for Low-Power ML Applications
Sathwika Bavikadi ... Mark A Indovina
-
Sathwika Bavikadi, et. al.Sathwika Bavikadi ... Mark A Indovina
01 Aug 2022
01 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Solid-State Circuits