Abstract

Deep Learning (DL), a subset of Artificial Intelligence (AI), is growing rapidly with possible applications in different domains such as speech recognition, computer vision etc. Deep Neural Network (DNN), the backbone of DL algorithms is a directed graph containing multiple layers with different number of neurons residing in each layer. The use of these networks has been increased in the last few years due to availability of large data sets and huge computation power. As the size of DNN is growing over the years, researchers have developed specialized hardware accelerators to reduce the inference compute time. An example of such domain specific architecture designed for Neural Network acceleration is Tensor Processing Unit (TPU) which outperforms GPU in the inference stage of DNN execution. The heart of this inference engine is a Matrix Multiplication unit which is based on systolic array architecture. The TPU's systolic array is a grid-like structure made of individual processing elements that can be extended along rows and columns. Due to external environmental factors or internal scaling of semiconductor, these systems are often prone to faults which leads to improper calculations and thereby resulting in inaccurate decisions by the DNN. Although a lot of work has been done in the past on the computing array implementation and it's reliability concerns, their fault tolerance behavior for DNN application is not very well understood. It is not even clear what would be the impact of various different faults on the accuracy. We in this work, first study possible mapping strategies to implement a convolution and dense layer weights on TPU systolic array. Next we consider various faults scenarios that may occur in the array. We divide these fault scenarios into low, high row and column faults (Fig. 1(a) pictorially represents column faults) modes with respect to the multiplication unit. Next, we study the impact of these fault models on the overall accuracy of the DNN performance on a faculty TPU unit. The goal is to study the resiliency and overcome the limitations of earlier work. The previous work was very effective in masking the random faults which used pruning of weights (removing weights or connections in the DNN) plus retraining to mask the faults on the array. However, it failed in the case of column faults which is clearly shown in Fig. 1(b). We also propose techniques to mitigate or bypass the row and column faults. Our mapping strategy follows physical_x(i) = i%N and physical_y(j) = j%N where (i,j) represents the index of dense (FC) weight matrix and (physical x(i), physical y(j)) indicates the actual physical location on the array of size N. The convolution filters are linearized with respect to every channel so as to convert them into proper weight matrix and mapped according to the previous mentioned policy. It was shown that DNNs can up to certain faults in the array while retaining the original accuracy (low row faults). The accuracy of the network decreases even with one column faults if it (column) is in the use. As per the results, it is proved that for the same number of row and column faults, the latter has most impact on the network accuracy because pruning input neuron has very little effect than pruning an output neuron. We experimented with three different networks and found the influence of these different faults to be the same. These faults can be mitigated using techniques like Matrix Transpose and Array Reduction which does not require retraining of weights. For low row faults, the original mapping policy can be retained such that weights can be mapped at their exact locations which does not affect the accuracy. Low column faults can be converted into low row faults by transposing the matrix. In the case of high row (column) faults, the entire row (column) has to be avoided to completely bypass the faulty locations. Static mapping of weights along with retraining the network on the array can be effective in the case of random faults. Adapting to change in the case of structured faults can reduce the burden of retraining which happens outside the TPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call