DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC

Kodai Ueyoshi,Marian Verhelst,Sebastian Giraldo,Ioannis A Papistas,Pouya Houshmand,Debjyoti Bhattacharjee,Stefan Cosemans,Jonas Doevenspeck,Man Shi,Qilin Zheng,Vikram Jain,Peter Vrancx,Diederik Verkest,Arindam Mallik,Peter Debacker,Giuseppe M Sarda

doi:10.1109/isscc42614.2022.9731716

Abstract

Energy-efficient matrix-vector multiplications (MVMs) are key to bringing neural network (NN) inference to edge devices. This has led to a wide range of state-of-the-art MVM acceleration chips, which fall into two categories: 1) Digital NN accelerators [1]–[2], constituting widely parallel multiply-accumulate (MAC) arrays at medium (typically 4-8b) precision. 2) Analog in-memory compute (AiMC) NN accelerators [3]–[4], which enable much higher energy efficiencies and throughput per unit area at the cost of a reduced computational precision, reduced dataflow flexibility, and resulting reduced mapping efficiency for some layer configurations. Neither of these approaches dominates the other, as it depends on the layer type which approach is the optimal. The ideal processor would enable exploiting both digital and AiMC NN acceleration concepts and select the best accelerator depending on the layer characteristics. Consequently, this work presents DIANA, a low-power NN processing SoC, comprising a precision-scalable digital NN accelerator, an AiMC core, an optimized shared-memory subsystem and a RISC-V host processor to achieve SOTA end-to-end inference at the edge. This SoC includes innovations in: a) its 16x16 digital NN core with flexible dataflow for fully connected and high-precision CONV layer execution, b) its 1152x512 AiMC core with SIMD digital post-processing and support for output unrolling for improving array utilization, and c) a shared memory system supporting efficient layer-fused execution schedules, controlled by the RISC-V. This allows simultaneous execution of subsequent layers across the digital and analog cores, assigning high-precision layers and layers with limited AiMC utilization (e.g. FC layers and layers with low channel count) to the digital core, and all other intermediate layers to the AiMC core. A top-level overview of the designed system and its highlights is depicted in Fig. 15.6.1.

Full Text