Abstract

Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all workloads are equally susceptible to errors. In this paper, we present a low power fault tolerance design technique that selects and protects the most susceptible workload. We propose to rank the workload susceptibility as the likelihood of any error to bypass the logic masking of the circuit and propagate to its outputs. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. We evaluate the proposed technique on timing-independent and timing-dependent errors induced by permanent and transient faults. In comparison with unranked selective fault tolerance approach, we demonstrate a) a similar error coverage with a 39.7% average reduction of the area overhead or b) a 86.9% average error coverage improvement for a similar area overhead. For the same area overhead case, we observe an error coverage improvement of 53.1% and 53.5% against permanent stuck-at and transition faults, respectively, and an average error coverage improvement of 151.8% and 89.0% against timing-dependent and timing-independent transient faults, respectively. Compared to TMR, the proposed technique achieves an area and power overhead reduction of 145.8% to 182.0%.

Highlights

  • Reliability of devices has been affected by technology scaling despite its advantages

  • We showed that not every workload is susceptible to errors induced by permanent or transient faults, which results in some input patterns being less protected by the inherent logic masking of the circuit (Table 1)

  • By combining the technique of Selective Fault Tolerance (Fig. 1) and a probabilistic fault model based on the theory of output deviations (Fig. 2), we proposed a low power selective fault tolerance design technique (Figs. 3 and 4)

Read more

Summary

Introduction

Reliability of devices has been affected by technology scaling despite its advantages. Output deviations (OD) were introduced in [26] as an RT-Level fault model calibrated through technology failure information that stems from technology reliability characterization, such as inductive fault analysis [9] This model is utilized for selecting the input patterns that maximize the probability of propagating an erroneous response to the primary outputs. We present a novel low power fault tolerance design technique applicable at the register-transferlevel, that selects and protects the most susceptible workload on the most susceptible logic cones by targeting both timing-independent and timing-dependent errors. Preliminary results of this technique were presented in [10], where only the timing-independent errors induced by stuck-at faults and input bit-flips were considered.

Motivation
Selective Fault Tolerance
Probabilistic Fault Model
Workload Types
Motivational Example
PSFT Design
Proposed PSFT Design Flow
Process 1 : Logic Cone Selection
Process 2 : Pattern Ranking and Pattern Selection
Timing-independent Pattern Ranking for Uncorrelated Workload
Experimental Validation
Evaluation Setup
Simulation Results
Computational Effort Comparison
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call