High-Throughput, Area-Efficient, and Variation-Tolerant 3-D In-Memory Compute System for Deep Convolutional Neural Networks

Hasita Veluri,Aaron Voon-Yew Thean,Jessie Xuhua Niu,Yida Li,Evgeny Zamburg

doi:10.1109/jiot.2021.3058015

Abstract

Untethered computing using deep convolutional neural networks (DCNNs) at the edge of IoT with limited resources requires systems that are exceedingly power and area-efficient. Analog in-memory matrix-matrix multiplications enabled by emerging memories can significantly reduce the energy budget of such systems and result in compact accelerators. In this article, we report a high-throughput RRAM-based DCNN processor that boasts $7.12\mathbf {\times }$ area-efficiency (AE) and $6.52\mathbf {\times }$ power-efficiency (PE) enhancements over state-of-the-art accelerators. We achieve this by coupling a novel in-memory computing methodology with a staggered-3D memristor array. Our variation-tolerant in-memory compute method, which performs operations on signed floating-point numbers within a single array, leverages charge domain operations and conductance discretization to reduce peripheral overheads. Voltage pulses applied at the staggered bottom electrodes of the 3D-array generate a concurrent input shift and parallelize convolution operations to boost throughput. The high density and low footprint of the 3D-array, along with the modified in-memory M2M execution, improve peak AE to 9.1TOPsmm−2 while the elimination of input regeneration improves PE to 10.6TOPsW−1. This work provides a path towards infallible RRAM-based hardware accelerators that are fast, low power, and low area.

Full Text