Impact Of Permanent Faults Research Articles

To maximize the performance and energy efficiency of Spiking Neural Network (SNN) processing on resource-constrained embedded systems, specialized hardware accelerators/chips are employed. However, these SNN chips may suffer from permanent faults which can affect the functionality of weight memory and neuron behavior, thereby causing potentially significant accuracy degradation and system malfunctioning. Such permanent faults may come from manufacturing defects during the fabrication process, and/or from device/transistor damages (e.g., due to wear out) during the run-time operation. However, the impact of permanent faults in SNN chips and the respective mitigation techniques have not been thoroughly investigated yet. Toward this, we propose RescueSNN, a novel methodology to mitigate permanent faults in the compute engine of SNN chips without requiring additional retraining, thereby significantly cutting down the design time and retraining costs, while maintaining the throughput and quality. The key ideas of our RescueSNN methodology are (1) analyzing the characteristics of SNN under permanent faults; (2) leveraging this analysis to improve the SNN fault-tolerance through effective fault-aware mapping (FAM); and (3) devising lightweight hardware enhancements to support FAM. Our FAM technique leverages the fault map of SNN compute engine for (i) minimizing weight corruption when mapping weight bits on the faulty memory cells, and (ii) selectively employing faulty neurons that do not cause significant accuracy degradation to maintain accuracy and throughput, while considering the SNN operations and processing dataflow. The experimental results show that our RescueSNN improves accuracy by up to 80% while maintaining the throughput reduction below 25% in high fault rate (e.g., 0.5 of the potential fault locations), as compared to running SNNs on the faulty chip without mitigation. In this manner, the embedded systems that employ RescueSNN-enhanced chips can efficiently ensure reliable executions against permanent faults during their operational lifetime.

Read full abstract

Approximate computing is known for enhancing deep neural network accelerators' energy efficiency by introducing inexactness with a tolerable accuracy loss. However, small accuracy variations may increase the sensitivity of these accelerators towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate deep neural network (AccDNN) accelerators has been thoroughly investigated in the literature. Conversely, the impact of permanent faults and their mitigation in approximate DNN (AxDNN) accelerators is vastly under-explored. Towards this, we first present an extensive fault resilience analysis of approximate multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) using the state-of-the-art Evoapprox8b multipliers in GPU and TPU accelerators. Then, we propose a novel fault mitigation method, i.e., fault-aware retuning of weights (Fal-reTune). Fal-reTune retunes the weights using a weight mapping function in the presence of faults for improved classification accuracy. To evaluate the fault resilience and the effectiveness of our proposed mitigation method, we used the most widely used MNIST, Fashion-MNIST, and CIFAR10 datasets. Our results demonstrate that the permanent faults exacerbate the accuracy loss in AxDNNs compared to the AccDNN accelerators. For instance, a permanent fault in AxDNNs can lead to 56\% accuracy loss, whereas the same faulty bit can lead to only 4\% accuracy loss in AccDNN accelerators. We empirically show that our proposed Fal-reTune mitigation method improves the performance of AxDNNs up to 98%, even with fault rates of up to 50%. Furthermore, we observe that the fault resilience in AxDNNs is orthogonal to their energy efficiency.

Read full abstract

Impact Of Permanent Faults Research Articles

Articles published on Impact Of Permanent Faults

RescueSNN: enabling reliable executions on spiking neural network accelerators under permanent faults.

Exposing Reliability Degradation and Mitigation in Approximate DNNs Under Permanent Faults

Handling Physical-Layer Deadlock Caused by Permanent Faults in Quasi-Delay-Insensitive Networks-on-Chip

Exploring the Interaction Between Device Lifetime Reliability and Security Vulnerabilities

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Impact Of Permanent Faults Research Articles

Articles published on Impact Of Permanent Faults

RescueSNN: enabling reliable executions on spiking neural network accelerators under permanent faults.

Exposing Reliability Degradation and Mitigation in Approximate DNNs Under Permanent Faults

Handling Physical-Layer Deadlock Caused by Permanent Faults in Quasi-Delay-Insensitive Networks-on-Chip

Exploring the Interaction Between Device Lifetime Reliability and Security Vulnerabilities