Using checksum to improve the reliability of embedded convolutional neural networks

Zhi Liu,Zhen Deng,Xinni Yang

doi:10.1016/j.microrel.2022.114666

Abstract

Convolutional Neural Networks (CNNs) have been used in resource-constrained edge devices, including safety-critical applications. However, recent studies demonstrate that soft errors may result in CNN prediction failures. Many approaches are proposed at the end of layers to detect computation errors. Still, these approaches incur severe overheads or cannot detect errors in memory, e.g., traditional Double Modular Redundancy (DMR) can protect data integrity in memory, but it incurs significant overheads on time and memory. Some existing DNN-specific approaches are lightweight to protect DNNs from transient faults but cannot provide high fault coverage to detect permanent faults in memory. Thus, we propose two low-cost software approaches, channel-based weight checksum and layer-based output checksum, which protect the integrity of weights and layer outputs in memory. We perform a large number of fault injections to evaluate the performance of the proposed approaches and compare them with other approaches. The results show that combining the proposed approaches can provide approximately 100 % fault coverage for weights and layer outputs in memory. The combination increases total time and memory overheads by 4 % and 17 %, respectively, for DNNs that run sequentially. In addition, we observe that the layer-based output checksum can improve the error resilience of register files because most SDCs caused by register file errors are induced by the control flow error and the pointer error. Finally, we present a case study on MobileNet to show that our approaches can be generalized to other DNN libraries that run parallel by dividing the single-layer execution into N parts. When N is 2, our approaches increase total time and memory overheads by 7.4 % and 31 %, respectively.

Full Text