Fault tolerance on-chip: a reliable computing paradigm using self-test, self-diagnosis, and self-repair (3S) approach

Xiaowei Li,Jing Ye,Guihai Yan,Ying Wang

doi:10.1007/s11432-017-9290-4

Abstract

If your computer crashes, you can revive it by a reboot, an empirical solution that usually turns out to be effective. The rationale behind this solution is that transient faults, either in hardware or software, can be fixed by refreshing the machine state. Such a “silver bullet", however, could be futile in the future because the faults, especially those existing in the hardware such as Integrated Circuit (IC) chips, cannot be eliminated by refreshing. What we need is a more sophisticated mechanism to steer the system back to the right track. The “magic cure" is the Fault Tolerance On-Chip (FTOC) mechanism, which relies on a suite of built-in design-for-reliability logic, including fault detection, fault diagnosis, and error recovery, working in a self-supportive manner. We have exploited the FTOC to build a holistic solution ranging from on-chip fault detection to error recovery mechanisms to address faults caused by chips progressively aging. Besides fault detection, the FTOC paradigm provides attractive benefits, such as facilitating graceful performance degradation, mitigating the impact of verification blind spots, and improving the chip yield.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fault tolerance on-chip: a reliable computing paradigm using self-test, self-diagnosis, and self-repair (3S) approach

Abstract

Talk to us

Similar Papers

More From: Science China Information Sciences

Lead the way for us

Journal: Science China Information Sciences	Publication Date: May 24, 2018
Citations: 4

Similar Papers

Fault-Diagnosis Systems
Rolf Isermann
-
Rolf IsermannRolf Isermann
01 Jan 2006
01 Jan 2006

Underlying and Persistence Fault Diagnosis in Wireless Sensor Networks Using Majority Neighbors Co-ordination Approach
Rakesh Ranjan Swain ... Sourav Kumar Bhoi
Wireless Personal Communications | VOL. 111
Rakesh Ranjan Swain, et. al.Rakesh Ranjan Swain ... Sourav Kumar Bhoi
17 Oct 2019
Wireless Personal Communications | VOL. 111

Machine Learning–Based Fault Detection and Diagnosis of Organic Rankine Cycle System for Waste-Heat Recovery
Jiangfeng Wang ... Zhilong He
Journal of Energy Engineering | VOL. 147
Jiangfeng Wang, et. al.Jiangfeng Wang ... Zhilong He
01 Aug 2021
Journal of Energy Engineering | VOL. 147

Zero-performance-overhead online fault detection and diagnosis in 3D stacked integrated circuits
Saleh Safiruddin ... Sorin Dan Cotofana
-
Saleh Safiruddin, et. al.Saleh Safiruddin ... Sorin Dan Cotofana
04 Jul 2012
04 Jul 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fault tolerance on-chip: a reliable computing paradigm using self-test, self-diagnosis, and self-repair (3S) approach

Abstract

Talk to us

Similar Papers

More From: Science China Information Sciences