Abstract
Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input dependent. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. In this paper, a methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the fault tolerant parallel processor (FTPP). The models considers both transient and permanent faults, hardware and software faults, unrelated and related software faults, automatic recovery and reconfiguration. The parameter values for the software part of the model are determined from an experimental implementation of an N-version programming application. The parameter values chosen for the hardware part of the model are considered fairly typical. >
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have