Inference engine benchmarking across technological platforms from CMOS to RRAM

Xiaochen Peng,Ryan M Hatcher,Minkyu Kim,Titash Rakshit,Jorge A Kittl,Jae-Sun Seo,Shihui Yin,Xiaoyu Sun,Shimeng Yu

doi:10.1145/3357526.3357566

Abstract

State-of-the-art deep convolutional neural networks (CNNs) are widely used in current AI systems, and achieve remarkable success in image/speech recognition and classification. A number of recent efforts have attempted to design custom inference engine based on various approaches, including the systolic architecture, near memory processing, and processing-in-memory (PIM) approach with emerging technologies such as resistive random access memory (RRAM). However, a comprehensive comparison of these various approaches in a unified framework is missing, and the benefits of new designs or emerging technologies are mostly based on qualitative projections. In this paper, we evaluate the energy efficiency and frame rate for a VGG-like CNN inference accelerator on CIFAR-10 dataset across the technological platforms from CMOS to post-CMOS, with hardware resource constraint, i.e. comparable on-chip area. We also investigate the effects of off-chip memory DRAM access and interconnect during data movement, which are the bottlenecks of CMOS platforms. Our quantitative analysis shows that the peripheries (ADCs) dominate in energy consumption and area (rather than memory array) in digital RRAM-based parallel readout PIM architecture. Despite presence of ADCs, this architecture shows >2.5× improvement in energy efficiency (TOPS/W) over systolic arrays or near memory processing, with a comparable frame rate due to reduced DRAM access, high throughput and optimized parallel read out. Further >10× improvements can be achieved by implementing bit-count reduced XNOR network and pipelining.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Inference engine benchmarking across technological platforms from CMOS to RRAM

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on Processing-in-Memory Architectures
Xiaochen Peng ... Rui Liu
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 67
Xiaochen Peng, et. al.Xiaochen Peng ... Rui Liu
10 Jan 2020
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 67

Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method
Zhenhua Zhu ... Hanbo Sun
-
Zhenhua Zhu, et. al.Zhenhua Zhu ... Hanbo Sun
05 Nov 2018
05 Nov 2018

FangTianSim: High-Level Cycle-Accurate Resistive Random-Access Memory-Based Multi-Core Spiking Neural Network Processor Simulator.
Jinsong Wei ... Xumeng Zhang
Frontiers in neuroscience | VOL. 15
Jinsong Wei, et. al.Jinsong Wei ... Xumeng Zhang
20 Jan 2022
Frontiers in neuroscience | VOL. 15

ESSENCE: Exploiting Structured Stochastic Gradient Pruning for Endurance-Aware ReRAM-Based In-Memory Training Systems
Xiaoxuan Yang ... Krishnendu Chakrabartys
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42
Xiaoxuan Yang, et. al.Xiaoxuan Yang ... Krishnendu Chakrabartys
01 Jul 2023
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inference engine benchmarking across technological platforms from CMOS to RRAM

Abstract

Talk to us

Similar Papers