Abstract

Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area. Architecture- and algorithm-level solutions are taken into consideration. Weight-separate mapping, VGG-like algorithm, multiple cells per weight, and fine-tuning of the classifier layer are effective for suppressing inference accuracy loss due to variation and allow for the lowest possible weight precision to improve area and energy efficiency. Higher priority should be given to developing low-conductance and low-variability memory devices that are essential for energy and area-efficiency IMC whereas low bit precision (< 3b) and memory window (< 10) are less concerned.

Highlights

  • Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area

  • While the emergence of quantized neural networks (QNNs) opens up the opportunity of implementing IMC using emerging non-volatile memory (NVM), the practical implementation is largely impeded by the imperfect memory characteristics, in particular, only a limited number of quantized memory states is available at the presence of intrinsic device variation

  • A set of weight matrix WM×N is assigned to an input vector IM×1

Read more

Summary

Introduction

Device quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area. Compared with the generic DNNs using floating-point weights and activations, QNNs demonstrate substantial speedup and a tremendous reduction in chip area and p­ ower[4] These are accomplished with nearly no or minor accuracy degradation in the inference tasks of complex CIFAR-10 or ImageNet ­data[3]. Taking into account other circuit-level constraints, such as limited current summing capability and peripheral circuit overhead, the energy and area efficiency of variation-aware IMC designs are compared to provide a useful guideline for future IMC PPA co-optimization

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call