Abstract
Failure Mode and Effects Analysis (FMEA) is a systematic technique to explore the possible failure modes of individual components or subsystems and determine their potential effects at the system level. Applications of FMEA are common in case of hardware and communication failures, but analyzing software failures (SW-FMEA) poses a number of challenges. Failures may originate in permanent software faults commonly called bugs, and their effects can be very subtle and hard to predict, due to the complex nature of programs. Therefore, a behavior-based automatic method to analyze the potential effects of different types of bugs is desirable. Such a method could be used to automatically build an FMEA report about the fault effects, or to evaluate different failure mitigation and detection techniques. This paper follows the latter direction, demonstrating the use of a model checking-based automated SW-FMEA approach to evaluate error detection and fault tolerance mechanisms, demonstrated on a case study inspired by safety-critical embedded operating systems.
Highlights
The risk of failure is one of the main concerns of safetycritical systems
The second column belongs to the Master-Checker oracle, which serves as the baseline of comparison (T), while the remaining columns show results for the other three detectors and their combination
The two bottom rows summarize the performance of detectors with the number of faults detected and the efficiency computed with Reference Model (REF) as the baseline
Summary
The risk of failure is one of the main concerns of safetycritical systems. Certification requires the systematic analysis of potential failures, their causes and effects, and the evaluation of risk mitigation techniques used to reduce the chance and the severity of system-level failures. Assuming a set of predefined fault types (programming faults) and a specification of safe behavior at the system level, the proposed approach applies model checking to systematically generate execution traces leading from fault activations to states that violate the specification of safe behavior (system-level failures). These traces can be used to understand and demonstrate fault propagation through the system and as test sequences to reveal actual faults in the final product.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have