IR-QNN Framework: An IR Drop-Aware Offline Training of Quantized Crossbar Arrays

Mohammed E Fouda,Gun Hwan Kim,Fadi Kurdahi,Jongeun Lee,Sugil Lee,Ahmed M Eltawi

doi:10.1109/access.2020.3044652

Abstract

Resistive Crossbar Arrays present an elegant implementation solution for Deep Neural Networks acceleration. The Matrix-Vector Multiplication, which is the corner-stone of DNNs, is carried out in $O(1)$ compared to $O(N^{2})$ steps for digital realizations of $O(log_{2}(N))$ steps for in-memory associative processors. However, the IR drop problem, caused by the inevitable interconnect wire resistance in RCAs remains a daunting challenge. In this article, we propose a fast and efficient training and validation framework to incorporate the wire resistance in Quantized DNNs, without the need for computationally extensive SPICE simulations during the training process. A fabricated four-bit Au/Al2O3/HfO2/TiN device is modelled and used within the framework with two-mapping schemes to realize the quantized weights. Efficient system-level IR-drop estimation methods are used to accelerate training. SPICE validation results show the effectiveness of the proposed method to capture the IR drop problem achieving the baseline accuracy with a 2% and 4% drop in the worst-case scenario for MNIST dataset on multilayer perceptron network and CIFAR 10 dataset on modified VGG and AlexNet networks, respectively. Other nonidealities, such as stuck-at fault defects, variability, and aging, are studied. Finally, the design considerations of the neuronal and the driver circuits are discussed.

Highlights

Artificial Intelligence hardware acceleration has attracted significant interest [1], [2] especially accelerating deep neural networks (DNNs) with in-memory processing, alleviating the memory-wall bottleneck problem in the Von-Neumann computing architecture
In-memory computing paradigm offers a powerful tool for accelerating artificial intelligence and machine learning algorithms where the matrix-vector multiplication (MVM) computation can be performed in O(1) with resistive crossbar structures and in O(log2(N )) with an in-memory associative processor [3], [4], unlike other digital implementations that require O(N 2) steps
We experimentally show that the proposed methods capture the IR drop problem showing a 2% and 4% drop in the worst-case scenario for MNIST dataset on multilayer perceptron network and CIFAR 10 dataset on modified VGG and AlexNet networks, respectively

Summary

INTRODUCTION

Artificial Intelligence hardware acceleration has attracted significant interest [1], [2] especially accelerating deep neural networks (DNNs) with in-memory processing, alleviating the memory-wall bottleneck problem in the Von-Neumann computing architecture. The higher the wire resistance, the higher accuracy drop.Since SPICE simulation is computationally expensive, it cannot be used for training of QNN Hardware solutions, such as using 1T1R structure to activate one column at a time, increase the time complexity of MVM to O(N) and require extra hardware to store data [28]. This technique requires a substantial analog memory (or ADC and memory) to save the analog output currents of each crossbar array, or at least it would involve an iterative SPICE simulation of the entire network, which is not practical for large networks and is not even preferred for small networks Another method is introduced in [30], where an iterative post-processing technique is proposed for finding the best weight matrix under the IR drop that is very close to the trained weight, which required at least seven iterations for 0.005 MSE. We choose the latter, random sampling from the measured data, since the Gaussian distribution may not accurately describe the randomness of the device’s states and device to device variations

MVM USING RCAS

RESULTS AND DISCUSSION

EFFECT OF LIMITED RETENTION

DRIVER AND NEURONAL CIRCUITS REQUIREMENTS

CONCLUSION AND FUTURE WORK

C WCb RWL