Abstract

Failure Mode and Effects Analysis (FMEA) is a systematic technique to explore the possible failure modes of individual components or subsystems and determine their potential effects at the system level. Applications of FMEA are common in case of hardware and communication failures, but analyzing software failures (SW-FMEA) poses a number of challenges. Failures may originate in permanent software faults commonly called bugs, and their effects can be very subtle and hard to predict, due to the complex nature of programs. Therefore, a behavior-based automatic method to analyze the potential effects of different types of bugs is desirable. Such a method could be used to automatically build an FMEA report about the fault effects, or to evaluate different failure mitigation and detection techniques. This paper follows the latter direction, demonstrating the use of a model checking-based automated SW-FMEA approach to evaluate error detection and fault tolerance mechanisms, demonstrated on a case study inspired by safety-critical embedded operating systems.

Highlights

  • The risk of failure is one of the main concerns of safetycritical systems

  • The second column belongs to the Master-Checker oracle, which serves as the baseline of comparison (T), while the remaining columns show results for the other three detectors and their combination

  • The two bottom rows summarize the performance of detectors with the number of faults detected and the efficiency computed with Reference Model (REF) as the baseline

Read more

Summary

Introduction

The risk of failure is one of the main concerns of safetycritical systems. Certification requires the systematic analysis of potential failures, their causes and effects, and the evaluation of risk mitigation techniques used to reduce the chance and the severity of system-level failures. Assuming a set of predefined fault types (programming faults) and a specification of safe behavior at the system level, the proposed approach applies model checking to systematically generate execution traces leading from fault activations to states that violate the specification of safe behavior (system-level failures). These traces can be used to understand and demonstrate fault propagation through the system and as test sequences to reveal actual faults in the final product.

Background
Example
Evaluation of Fault Tolerance Mechanisms and Error Detectors
X X – X
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call