Abstract
This paper presents a quantitative reliability analysis of a system designed to tolerate both hardware and software faults. The system being studied achieves integrated fault tolerance by implementing N-version programming (NVP) on redundant hardware. The analysis of the system considers independent software faults, related software faults, transient hardware faults, permanent hardware faults, and imperfect coverage. The overall model is a Markov reward model in which the states of the Markov chain represent the long-term evolution of the structure of the system. For each operational configuration, a fault tree model captures the effects of software faults and transient hardware faults on the task computation. The fault tree models define the reward structure for the overall model. The software fault model is parameterized using experimental data associated with a recent implementation of an NVP system using the current design paradigm, in which the predictions of software failures are very close to the empirical data. The hardware model is parameterized by considering typical failure rates associated with hardware faults and coverage parameters. Results from our study show that it is important to consider both hardware and software faults in the reliability analysis of an NVP system, since these estimates increase with time. Moreover, the function for error detection and recovery is extremely important to fault-tolerant software.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have