Full Fault Resilience and Relaxed Synchronization Requirements at the Cache-Memory Interface

Chengmo Yang,Alex Orailoglu

doi:10.1109/tvlsi.2010.2067230

Abstract

While multicore platforms promise significant speedup for many current applications, they also suffer from increased reliability problems as a result of ever scaling device size. The projected elevation in fault rate, together with the diverse behavior of fault manifestation, argues for highly efficient solutions of full fault resilience. Traditional duplication and checkpointing strategies typically impose sizable overhead in checkpointing execution results, or in constantly synchronizing two threads for value checking. To reduce such overhead while at the same time delivering full fault resilience, we propose an integrated fault detection and checkpointing framework, wherein the comparison and checkpointing process is performed at the cache-memory interface. By sharing a single cache between two duplicated threads, execution results can be directly verified in the cache before being written back, thus strictly protecting the memory against execution faults. Meanwhile, as unconfirmed data are allowed to be written into the cache, one thread can run well ahead of the other, thus relaxing the straightjacket of the strict execution synchronization model. If a cache block is constantly updated, further synchronization relaxation can be achieved through extending the cache design to duplicate a cache block and skip the comparison of the intermediate values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Full Fault Resilience and Relaxed Synchronization Requirements at the Cache-Memory Interface

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Lead the way for us

Journal: IEEE Transactions on Very Large Scale Integration (VLSI) Systems	Publication Date: Nov 1, 2011
Citations: 24

Similar Papers

A framework for fault detection and diagnostics of articulated collaborative robots based on hybrid series modelling of Artificial Intelligence algorithms
Adalberto Polenghi ... Marco Macchi
Journal of Intelligent Manufacturing | VOL. 35
Adalberto Polenghi, et. al.Adalberto Polenghi ... Marco Macchi
19 Jan 2023
Journal of Intelligent Manufacturing | VOL. 35

Yield-per-Area Optimization for 6T-SRAMs Using an Integrated Approach to Exploit Spares and ECC to Efficiently Combat High Defect and Soft-Error Rates
Jae Chul Cha ... Sandeep K Gupta
-
Jae Chul Cha, et. al.Jae Chul Cha ... Sandeep K Gupta
01 Nov 2011
01 Nov 2011

On-Line Fault Protection for ReRAM-Based Neural Networks
Wen Li ... Cheng Liu
IEEE Transactions on Computers | VOL. 72
Wen Li, et. al.Wen Li ... Cheng Liu
01 Feb 2023
IEEE Transactions on Computers | VOL. 72

Integrated Design of Feedback Control and Fault Detection Based on Reduced-Order Observer
Suichun Qu ... Xue Li
-
Suichun Qu, et. al.Suichun Qu ... Xue Li
28 Oct 2022
28 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Full Fault Resilience and Relaxed Synchronization Requirements at the Cache-Memory Interface

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems