Abstract

AbstractSystems designed for high availability and fault tolerance are often configured as a series combination of redundant subsystems. When a unit of a subsystem fails, the system remains operational while the failed unit is repaired; however, if too many units in a subsystem fail concurrently, the system fails. Under conditions usually met in practical situations, we show that the reliability and availability of such systems can be accurately modeled by representing each redundant subsystem with a constant, ‘effective’ failure rate equal to the inverse of the subsystem mean‐time‐to‐failure (MTTF). The approximation model is surprisingly accurate, with an error on the order of the square of the ratio mean‐time‐to‐repair to mean‐time‐to‐failure (MTTR/MTTF), and it has wide applicability for commercial, high‐availability and fault‐tolerant computer systems. The effective subsystem failure rates can be used to: (1) evaluate the system and subsystem reliability and availability; (2) estimate the system MTTF; and (3) provide a basis for the iterative analysis of large complex systems. Some observations from renewal theory suggest that the approximate models can be used even when the unit failure rates are not constant and when the redundant units are not homogeneous. Copyright © 2004 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call