Abstract

This paper presents a quantitative reliability analysis of a system designed to tolerate both hardware and software faults. The system achieves integrated fault tolerance by implementing N-version programming (NVP) on redundant hardware. The system analysis considers unrelated software faults, related software faults, transient hardware faults, permanent hardware faults, and imperfect coverage. The overall model is Markov in which the states of the Markov chain represent the long-term evolution of the system-structure. For each operational configuration, a fault-tree model captures the effects of software faults and transient hardware faults on the task computation. The software fault model is parameterized using experimental data associated with a recent implementation of an NVP system using the current design paradigm. The hardware model is parameterized by considering typical failure rates associated with hardware faults and coverage parameters. The authors results show that it is important to consider both hardware and software faults in the reliability analysis of an NVP system, since these estimates vary with time. Moreover, the function for error detection and recovery is extremely important to fault-tolerant software. Several orders of magnitude reduction in system unreliability can be observed if this function is provided promptly. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call