Device quantization policy in variation-aware in-memory computing design

Chih-Cheng Chang,Tong-Lin Pan,Chia-Ming Tsai,Tian-Sheuan Chang,Shao-Tzu Li,I-Ting Wang,Tuo-Hung Hou

doi:10.1038/s41598-021-04159-x

Chih-Cheng Chang, Tong-Lin Pan + Show 5 more

Open Access

https://doi.org/10.1038/s41598-021-04159-x

Copy DOI

Journal: Scientific Reports	Publication Date: Jan 7, 2022
Citations: 6	License type: open-access

Affiliation: National Yang Ming Chiao Tung University

Abstract

Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area. Architecture- and algorithm-level solutions are taken into consideration. Weight-separate mapping, VGG-like algorithm, multiple cells per weight, and fine-tuning of the classifier layer are effective for suppressing inference accuracy loss due to variation and allow for the lowest possible weight precision to improve area and energy efficiency. Higher priority should be given to developing low-conductance and low-variability memory devices that are essential for energy and area-efficiency IMC whereas low bit precision (< 3b) and memory window (< 10) are less concerned.

Highlights

Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area
While the emergence of quantized neural networks (QNNs) opens up the opportunity of implementing IMC using emerging non-volatile memory (NVM), the practical implementation is largely impeded by the imperfect memory characteristics, in particular, only a limited number of quantized memory states is available at the presence of intrinsic device variation
A set of weight matrix WM×N is assigned to an input vector IM×1

Summary

Introduction

Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area. Compared with the generic DNNs using floating-point weights and activations, QNNs demonstrate substantial speedup and a tremendous reduction in chip area and p ower[4] These are accomplished with nearly no or minor accuracy degradation in the inference tasks of complex CIFAR-10 or ImageNet data[3]. Taking into account other circuit-level constraints, such as limited current summing capability and peripheral circuit overhead, the energy and area efficiency of variation-aware IMC designs are compared to provide a useful guideline for future IMC PPA co-optimization

Methods

Results

Conclusion