ACortex: An Energy-Efficient Multipurpose Mixed-Signal Inference Accelerator

Mohammad Bavandpour,Mohammad R Mahmoodi,Dmitri B Strukov

doi:10.1109/jxcdc.2020.2999581

Abstract

We introduce “aCortex,” an extremely energy-efficient, fast, compact, and versatile neuromorphic processor architecture suitable for the acceleration of a wide range of neural network inference models. The most important feature of our processor is a configurable mixed-signal computing array of vector-by-matrix multiplier (VMM) blocks utilizing embedded nonvolatile memory arrays for storing weight matrices. Analog peripheral circuitry for data conversion and high-voltage programming are shared among a large array of VMM blocks to facilitate compact and energy-efficient analog-domain VMM operation of different types of neural network layers. Other unique features of aCortex include configurable chain of buffers and data buses, simple and efficient instruction set architecture and its corresponding multiagent controller, programmable quantization range, and a customized refresh-free embedded dynamic random access memory. The energy-optimal aCortex with 4-bit analog computing precision was designed in a 55-nm process with embedded NOR flash memory. Its physical performance was evaluated using experimental data from testing individual circuit elements and physical layout of key components for several common benchmarks, namely, Inception-v1 and ResNet-152, two state-of-the-art deep feedforward networks for image classification, and GNTM, Google’s deep recurrent network for language translation. The system-level simulation results for these benchmarks show the energy efficiency of 97, 106, and 336 TOp/J, respectively, combined with up to 15 TOp/s computing throughput and 0.27-MB/mm2 storage efficiency. Such estimated performance results compare favorably with those of previously reported mixed-signal accelerators based on much less mature aggressively scaled resistive switching memories.

Highlights

T HE rapidly growing range of applications of machine learning for image classification, speech recognition, and natural language processing along with maturing of the neural network algorithms, especially for deep learning, led to an urgent need in specialized neuromorphic hardware [1]–[3]
We developed a system-level estimator that imports the target network’s computational graph along with experimental and circuit-level simulation results for different architecture components, including digitalto-analog converters (DACs), analog-to-digital converters (ADCs), sense amplifiers, memory cells, digital blocks, and buses, maps the weight kernels onto the 2-D array of nonvolatile memory (NVM) blocks, and produces a comprehensive performance report considering various nonidealities, such as leakages and line parasitics
Each mixed-signal processing units (MSPUs) is comprised of two N -by-M arrays of vector-by-matrix multiplier (VMM) circuit blocks located on each side of a column with N neuron blocks

Summary

INTRODUCTION

T HE rapidly growing range of applications of machine learning for image classification, speech recognition, and natural language processing along with maturing of the neural network algorithms, especially for deep learning, led to an urgent need in specialized neuromorphic hardware [1]–[3]. We developed a system-level estimator that imports the target network’s computational graph along with experimental and circuit-level simulation results for different architecture components, including digitalto-analog converters (DACs), analog-to-digital converters (ADCs), sense amplifiers, memory cells, digital blocks, and buses, maps the weight kernels onto the 2-D array of nonvolatile memory (NVM) blocks, and produces a comprehensive performance report considering various nonidealities, such as leakages and line parasitics. Using such a simulator, we perform a detailed performance analysis based on the actual layout in the 55-nm process with embedded NOR flash memory. Related prior works are discussed and compared with aCortex in Section S.III in the Supplementary Material

TOP-LEVEL ARCHITECTURE

MIXED-SIGNAL PROCESSING UNIT

CIRCUIT DESIGN AND PERFORMANCE EVALUATION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal on Exploratory Solid-State Computational Devices and Circuits	Publication Date: Jun 1, 2020
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ACortex: An Energy-Efficient Multipurpose Mixed-Signal Inference Accelerator

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Lead the way for us

Similar Papers

An area efficient charge pump and a charge pump having adjustable voltage output for embedded nor flash memory
Shengbo Zhang ... Guangjun Yang
-
Shengbo Zhang, et. al.Shengbo Zhang ... Guangjun Yang
01 Oct 2014
01 Oct 2014

Safe microcontrollers with error protection encoder-decoder using bit-inversion techniques for on-chip flash integrity verification
Daejin Park ... Tag Gon Kim
-
Daejin Park, et. al.Daejin Park ... Tag Gon Kim
01 Oct 2013
01 Oct 2013

A New Architecture for High-Density High-Performance SGT nor Flash Memory
T Kadowaki ... F Masuoka
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 55
T Kadowaki, et. al.T Kadowaki ... F Masuoka
01 Jun 2008
IEEE Transactions on Circuits and Systems II: Express Briefs | VOL. 55

A novel sourceline voltage compensation circuit for embedded NOR flash memory
Shengbo Zhang ... Jun Xiao
Journal of Semiconductors | VOL. 35
Shengbo Zhang, et. al.Shengbo Zhang ... Jun Xiao
01 Jul 2014
Journal of Semiconductors | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ACortex: An Energy-Efficient Multipurpose Mixed-Signal Inference Accelerator

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal on Exploratory Solid-State Computational Devices and Circuits