On the Fault Hypothesis for a Safety-Critical Real-Time System

H Kopetz

doi:10.1007/11823063_3

Abstract

A safety-critical real-time computer system must provide its services with a dependability that is much better than the dependability of any one of its constituent components. This challenging goal can only be achieved by the provision of fault tolerance. The design of any fault-tolerant system proceeds in four distinct phases. In the first phase the fault hypothesis is shaped, i.e. assumptions are made about the types and numbers of faults that must be tolerated by the planned system. In the second phase an architecture is designed that tolerates the specified faults. In the third phase the architecture is implemented and the functions and fault-tolerance mechanisms are validated. Finally, in the fourth phase it has to be confirmed experimentally that the assumptions contained in the fault-hypothesis are met by reality. The first part of this contribution focuses on the establishment of a comprehensive fault hypothesis for safety-critical real-time computer systems. The size of the fault containment regions, the failure mode of the fault containment regions, the assumed frequency of the faults and the assumptions about error detection latency and error containment are discussed under the premise that in future a distributed system node is expected to be a system-on-a-chip (SOC). The second part of this contribution focuses on the implications that such a fault hypothesis will have on the future architecture of distributed safety-critical real-time computer systems in the automotive domain.

Full Text