Abstract

Fault-tolerance is an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic fault and error handling mechanisms play a crucial role in implementing fault tolerance because an uncovered (undetected) fault may lead to a system or a subsystem failure even when adequate redundancy exists. Examples of this effect can be found in computing systems, electrical power distribution networks, pipelines carrying dangerous materials etc. Because an uncovered fault may lead to overall system failure, an excessive level of redundancy may even reduce the system reliability. We consider three types of coverage models: 1. element level coverage where the fault coverage probability of an element does not depend on the states of other elements; 2. the multi-fault coverage where the effectiveness of recovery mechanisms depends on the coexistence of multiple faults in a group of elements that collectively participate in detecting and recovering the faults in that group; 3. the performance dependent coverage where the effectiveness of recovery mechanisms in a group depends on the entire performance level of this group. The paper presents a modification of the generalized reliability block diagram (RBD) method for evaluating reliability and performance indices of complex multi-state series-parallel systems with all these types of fault coverage. The suggested method based on a universal generating function technique allows the system performance distribution to be obtained using a straightforward recursive procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call