Abstract

Fault tolerance has been an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic fault detection, location, isolation, recovery, and reconfiguration mechanisms play a crucial role in implementing fault tolerance because a not-covered fault may lead to a system or subsystem failure even when adequate redundancy exists. The probability of successfully recovering from a fault given that the fault has occurred is known as the coverage factor or coverage and this is used to account for the efficiency of fault-tolerant mechanisms. If the fault and error handling mechanisms cannot successfully cover all faults in the system, then the coverage factor becomes less than unity and the system is said to have imperfect coverage. The models that consider the effects of imperfect fault coverage are known as imperfect fault coverage models or simply imperfect coverage models, fault coverage models, or coverage models. For systems with imperfect fault coverage, an excessive level of redundancy may even reduce the system reliability. Therefore, an accurate analysis must account for not only the system structure but also the system fault and error handling behavior, which is often called coverage behavior. The appropriate coverage modeling approach depends on the type of fault tolerant techniques used. In this chapter, we present the status and trends of imperfect coverage models, and associated reliability analysis techniques. We also present the historical developments, modeling approaches, reliability algorithms, optimal design policies, and available software tools.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call