Reliability analysis of a hardware and software fault tolerant parallel processor

J.B Dugan

doi:10.1109/reldis.1994.336907

J.B Dugan

https://doi.org/10.1109/reldis.1994.336907

Copy DOI

Export

Save

Cite

Publication Date: Oct 25, 1994

Citations: 5

Affiliation: University of Virginia

Abstract
Full-Text
Similar Papers

Abstract

Listen

Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input dependent. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. In this paper, a methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the fault tolerant parallel processor (FTPP). The models considers both transient and permanent faults, hardware and software faults, unrelated and related software faults, automatic recovery and reconfiguration. The parameter values for the software part of the model are determined from an experimental implementation of an N-version programming application. The parameter values chosen for the hardware part of the model are considered fairly typical. >

Full Text