ReDup: A software-based method for detecting soft-error using data analysis

Bahman Arasteh

doi:10.1016/j.compeleceng.2019.07.006

Abstract

ContextSoft error is one of the main sources of failure in the computer systems. The main drawback of the state-of-the-art software-based methods for detecting and handling soft errors is their undesirable performance-overhead. ProblemIn a computer program, different data and instructions have different effects on the program behavior and hence, different data have different sensitivity against errors. One of the main research challenges in this field of study is that which data and instructions of a program, as the sensitive sections, should be duplicated against soft-errors? MethodWe propose an error detection method which precisely identifies and duplicates the most sensitive data in a program. The method, firstly, quantifies the sensitivity of program data by analyzing different features of data such as dependency and life time. Secondly, the most sensitive data are technically relocated in the source code to enhance the rate of error masking by the program. Finally, a small percentage of the identified sensitive data are duplicated to provide error-detection capability in the program. ResultsThe results of extensive fault-injection experiments confirm that relocating and duplicating only 30% of most sensitive data identified by the proposed method enables the program to detect 80% of failure causing errors with 23.99% performance overhead. Furthermore, making technical relocation in the source code of a program by the proposed method leads to 6% improvement in the rate of error masking.

Full Text