Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of software techniques for tolerating hardware faults. Software-implemented FI can be performed at different levels of abstraction in the system stack; FI performed at the compiler’s intermediate representation (IR) level has the advantage that it is closer to the program being evaluated and is hence easier to derive insights from for the design of software fault-tolerance mechanisms. Unfortunately, it is not clear how accurate IR-level FI is vis-a-vis FI performed at the assembly code level, and prior work has presented contradictory findings. In this article, we perform a comprehensive evaluation of the accuracy of IR-level FI across a range of benchmark programs and compiler optimization levels. Our results show that IR-level FI is as accurate as assembly-level FI for silent data corruption (SDC) probability estimation across different benchmarks and optimization levels. Further, we present a machine-learning-based technique for improving the accuracy of <i>crash</i> probability measurements made by IR-level FI, which takes advantage of an observed correlation between program crash probabilities and instructions that operate on memory address values. We find that the machine learning technique provides comparable accuracy for IR-level FI as assembly code level FI for program crashes.
Read full abstract