Analyzing the Soft Error Reliability of Convolutional Neural Networks on Graphics Processing Unit

Khalid Adam,Izzeldin I Mohd,Younis Ibrahim

doi:10.1088/1742-6596/1933/1/012045

Khalid Adam, Izzeldin I Mohd + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1933/1/012045

Copy DOI

Abstract

There has been extensive use of Convolutional Neural Networks (CNNs) in safety-critical applications. Presently, GPUs are the most prominent and dominated DNN accelerators to increase the execution speed of CNN models to improve their performance as well as the Latency. However, GPUs are prone to soft errors. These errors can impact the behaviors of the GPU dramatically. Thus, the generated fault may corrupt data values or logic operations and cause errors, such as Silent Data Corruption (SDC). unfortunately, soft errors propagate from the physical level (GPUs) to the application level (CNN model). This paper analyzes the reliability of the AlexNet model to identify which part of the model more vulnerable to the soft error. To achieve this, we injected the AlexNet run on top of NVIDIA’s GPU, using the SASSIFI fault injector as the major evaluator tool. The experiments demonstrate a high reduction from 9.3 % to 0.00% SDCs errors in STORE and 5.0 % to 0.00% SDCs errors in GPR in Im2col. While Add_bias kernel instructions STORE and GPR the errors reduced from 0.8 % to 0.00% and 1.2 % to 0.1% SDCs error respectively.

Full Text