Abstract

In this chapter, we present for the first time (a) a systematic and holistic method to realise on-demand fault tolerance support on Tightly Coupled Processor Arrays (TCPAs) rather than single processors. Here, we propose (b) different level of replications, i. e., no replication, Dual Modular Redundancy (DMR), and Triple Modular Redundancy (TMR), with different capabilities for error handling for TCPAs. Here, a major contribution is to (c) apply these individual replication schemes based on a our novel reliability calculus for each of the proposed replication schemes and based on environmental conditions such as monitored Soft Error Rates (SERs) on the system. The strength of our reliability analysis is the usage of application execution characteristics that we derive from the compilation process. This will guide a system to transparently adopt suitable fault tolerance techniques upon application needs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call