Transient errors induced by radiations cause bit-flips in flip-flops (flip-flop soft errors). Modeling the error resilience level of a target system for flip-flop soft errors is a crucial step to achieve a cost-effective error resilience solution. This step often requires a significant amount of time and effort for a large number of fault injection simulations. As technology scales, the required effort grows in a new dimension with the increased probability of multi-bit upsets (MBUs). In this work, we present a new estimation model that predicts the resulting error resilience levels for the flip-flop MBU cases. This estimation model only requires the measured soft error effects of the single-bit upset (SBU) cases. This model uses two strategies to address how multiple bit-flips that happen simultaneously in a system affects the outcome of application execution. We evaluate the accuracy level of the MBU estimation model using actual fault injection results on two different processor cores. The two main strategies in our estimation model improve the accuracy levels by more than 7×.
Read full abstract