Abstract

Fault tolerance is an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic recovery and reconfiguration mechanisms play a crucial role in implementing fault tolerance because an uncovered fault may lead to a system or subsystem failure even when adequate redundancy exists. An excessive level of redundancy may even reduce the system reliability in addition to consuming system resources. Therefore, an accurate reliability analysis must account for not only the system structure but also the system fault and error handling behavior. The models that capture the fault and error handling behavior are called coverage models. The appropriate coverage modeling approach depends on the type of fault-tolerant techniques used. This paper describes and demonstrates a solution methodology that determines optimal design configurations that maximize the reliability of fault-tolerant systems subject to imperfect fault coverage and resource constraints. It is assumed that the system consists of several subsystems in series where each subsystem contains multiple redundant components. The problem formulation considers the generic type of fault-tolerant mechanisms and associated coverage models for each subsystem. The objective of the optimal design is to select the design configuration, type of components, and fault-tolerant mechanism for each subsystem from the applicable/available choices. Optimal solutions are determined based on an equivalent problem formulation and integer programming. The methodology presented here is flexible and can accurately model a wide range of faulttolerant systems used in safety-critical applications. The methodology is successfully demonstrated on a large problem with 14 subsystems and 4 component choices for each subsystem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call