Abstract
Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all workloads are equally susceptible to errors. In this paper, we present a low power fault tolerance design technique that selects and protects the most susceptible workload. We propose to rank the workload susceptibility as the likelihood of any error to bypass the logic masking of the circuit and propagate to its outputs. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. We evaluate the proposed technique on timing-independent and timing-dependent errors induced by permanent and transient faults. In comparison with unranked selective fault tolerance approach, we demonstrate a) a similar error coverage with a 39.7% average reduction of the area overhead or b) a 86.9% average error coverage improvement for a similar area overhead. For the same area overhead case, we observe an error coverage improvement of 53.1% and 53.5% against permanent stuck-at and transition faults, respectively, and an average error coverage improvement of 151.8% and 89.0% against timing-dependent and timing-independent transient faults, respectively. Compared to TMR, the proposed technique achieves an area and power overhead reduction of 145.8% to 182.0%.
Highlights
Reliability of devices has been affected by technology scaling despite its advantages
We showed that not every workload is susceptible to errors induced by permanent or transient faults, which results in some input patterns being less protected by the inherent logic masking of the circuit (Table 1)
By combining the technique of Selective Fault Tolerance (Fig. 1) and a probabilistic fault model based on the theory of output deviations (Fig. 2), we proposed a low power selective fault tolerance design technique (Figs. 3 and 4)
Summary
Reliability of devices has been affected by technology scaling despite its advantages. Output deviations (OD) were introduced in [26] as an RT-Level fault model calibrated through technology failure information that stems from technology reliability characterization, such as inductive fault analysis [9] This model is utilized for selecting the input patterns that maximize the probability of propagating an erroneous response to the primary outputs. We present a novel low power fault tolerance design technique applicable at the register-transferlevel, that selects and protects the most susceptible workload on the most susceptible logic cones by targeting both timing-independent and timing-dependent errors. Preliminary results of this technique were presented in [10], where only the timing-independent errors induced by stuck-at faults and input bit-flips were considered.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have