On providing scalable self-healing adaptive fault-tolerance to RTR SoCs

Byron Navas,Johnny Oberg,Ingo Sander

doi:10.1109/reconfig.2014.7032541

Abstract

The dependability of heterogeneous many-core FPGA based systems are threatened by higher failure rates caused by disruptive scales of integration, increased design complexity, and radiation sensitivity. Triple-modular redundancy (TMR) and run-time reconfiguration (RTR) are traditional fault-tolerant (FT) techniques used to increase dependability. However, hardware redundancy is expensive and most approaches have poor scalability, flexibility, and programmability. Therefore, innovative solutions are needed to reduce the redundancy cost but still preserve acceptable levels of dependability. In this context, this paper presents the implementation of a self-healing adaptive fault-tolerant SoC that reuses RTR IP-cores in order to self-assemble different TMR schemes during run-time. The presented system demonstrates the feasibility of the Upset-Fault-Observer concept, which provides a run-time self-test and recovery strategy that delivers fault-tolerance over functions accelerated in RTR cores, at the same time reducing the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles. In addition, this paper experimentally evaluates the trade-off of the implemented reconfigurable TMR schemes by characterizing important fault tolerant metrics i.e., recovery time (self-repair and self-replicate), detection latency, self-assembly latency, throughput reduction, and increase of physical resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On providing scalable self-healing adaptive fault-tolerance to RTR SoCs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The upset-fault-observer: A concept for self-healing adaptive fault tolerance
Byron Navas ... Johnny Oberg
-
Byron Navas, et. al.Byron Navas ... Johnny Oberg
01 Jul 2014
01 Jul 2014

Side-Channel Attacks on Triple Modular Redundancy Schemes
Felipe Almeida ... Levent Aksoy
-
Felipe Almeida, et. al.Felipe Almeida ... Levent Aksoy
01 Nov 2021
01 Nov 2021

A Fault Tolerant Voter Circuit for Triple Modular Redundant System
Mohammed Hadifur Rahman
Journal of Electrical and Electronic Engineering | VOL. 5
Mohammed Hadifur RahmanMohammed Hadifur Rahman
01 Jan 2017
Journal of Electrical and Electronic Engineering | VOL. 5

Susceptible workload driven selective fault tolerance using a probabilistic fault model
Mauricio D Gutierrez ... Vasileios Tenentes
-
Mauricio D Gutierrez, et. al.Mauricio D Gutierrez ... Vasileios Tenentes
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On providing scalable self-healing adaptive fault-tolerance to RTR SoCs

Abstract

Talk to us

Similar Papers