Abstract

Nonvolatile computing-in-memory (nvCIM) exhibits high potential for neuromorphic computing involving massive parallel computations and for achieving high energy efficiency. nvCIM is especially suitable for deep neural networks, which are required to perform large amounts of matrix–vector multiplications. However, a comprehensive quantization algorithm has yet to be developed, which overcomes the hardware limitations of resistive random access memory (ReRAM)-based nvCIM, such as the number of I/O, word lines (WLs), and ADC outputs. In this article, we propose a quantization training method for compressing deep models. The method comprises three steps: input and weight quantization, ReRAM convolution (ReConv), and ADC quantization. ADC quantization optimizes the error sampling problem by using the Gumbel-softmax trick. Under a 4-bit ADC of nvCIM, the accuracy only decreases by 0.05% and 1.31% for the MNIST and CIFAR-10, respectively, compared with the corresponding accuracies obtained under an ideal ADC. The experimental results indicate that the proposed method is effective for compensating the hardware limitations of nvCIM macros.

Highlights

  • D EEP neural networks (DNNs) have highly flexible parametric properties, and these properties are being exploited to develop artificial intelligence (AI) applications in various domains ranging from cloud computing to edge computing

  • Under a 4-bit ADC of Nonvolatile computing-in-memory (nvCIM), the accuracy only decreases by 0.05% and 1.31% for the MNIST and CIFAR-10, respectively, compared with the corresponding accuracies obtained under an ideal ADC

  • According to the analysis of nvCIM, we propose a quantization scheme that accounts for the hardware limitations of nvCIM

Read more

Summary

INTRODUCTION

D EEP neural networks (DNNs) have highly flexible parametric properties, and these properties are being exploited to develop artificial intelligence (AI) applications in various domains ranging from cloud computing to edge computing. The techniques for achieving these improvements involve the design of the entire nvCIM macro, physical characteristics of ReRAM, and precision of ADC outputs. Considering the input pattern and process variation discussed earlier, achieving a higher precision of input would lead to very close distributions of IBL when the MAC value is high [see Fig. 3(b)]. One is to use multiple single-levelcell (SLC) ReRAM to represent one weight value This increases the area cost and complexity of the MAC operation, which is related to the power and latency of the entire input process. With increasing IBL due to multibit MAC operations, the input offset of ADC increases, which reduces the accuracy of the sensing output [see Fig. 4(a)]. Due to the limitation of a ReRAM cell and model accuracy, the proposed weight quantization focuses on the design of SLC ReRAM.

ReRAM CONVOLUTION
5: Optionally apply pooling
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.