Abstract

Resistive random access memory (RRAM) is a promising technology for energy-efficient neuromorphic accelerators. However, when a pretrained deep neural network (DNN) model is programmed to an RRAM array for inference, the model suffers from accuracy degradation due to RRAM nonidealities, such as device variations, quantization error, and stuck-at-faults. Previous solutions involving multiple read–verify–write (R-V-W) to the RRAM cells require cell-by-cell compensation and, thus, an excessive amount of processing time. In this article, we propose a joint algorithm-design solution to mitigate the accuracy degradation. We first leverage knowledge distillation (KD), where the model is trained with the RRAM nonidealities to increase the robustness of the model under device variations. Furthermore, we propose random sparse adaptation (RSA), which integrates a small on-chip memory with the main RRAM array for postmapping adaptation. Only the on-chip memory is updated to recover the inference accuracy. The joint algorithm-design solution achieves the state-of-the-art accuracy of 99.41% for MNIST (LeNet-5) and 91.86% for CIFAR-10 (VGG-16) with up to 5% parameters as overhead while providing a 15– $150\times $ speedup compared with R-V-W.

Highlights

  • T ODAY deep neural networks (DNNs) have achieved or even surpassed human-level performance in many fields, such as image recognition [1], natural language processing [2], and robotics

  • The knowledge distillation (KD)+random sparse adaptation (RSA) accuracy corresponds to the scenario where both KD-based variation-aware training (VAT) and RSA are performed

  • KD+RSA, with up to 5% of the parameters on the on-chip memory, outperforms all the previous approaches and achieves the state-of-the-art inference accuracy of 91.86% and 99.13% for CIFAR-10 and MNIST, respectively

Read more

Summary

Introduction

T ODAY deep neural networks (DNNs) have achieved or even surpassed human-level performance in many fields, such as image recognition [1], natural language processing [2], and robotics. GPUs, in general, consume high energy and computational resources in both training and inference operations [3], [5]. The main driving force of semiconductor design, CMOS technology scaling, is approaching the limit and becomes increasingly difficult to deliver the computation power that is needed for DNNs. In addition, conventional CMOS architecture faces the memory wall, i.e., von-Neumann bottleneck [7], further complicating the challenges to achieve high-performance and energy-efficient computing. Conventional CMOS architecture faces the memory wall, i.e., von-Neumann bottleneck [7], further complicating the challenges to achieve high-performance and energy-efficient computing In this context, there is an urgent need for hardware acceleration, exploring beyond-traditional CMOS technology, architectural, and algorithmic solutions [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call