Implementing fault-tolerance in real-time systems by automatic program transformations

Tolga Ayav,Alain Girault,Pascal Fradet

doi:10.1145/1176887.1176917

Abstract

We present a formal approach to implement and certify fault-tolerance in real-time embedded systems. The fault-intolerant initial system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and roll-back. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformation highlights the benefits of separation of concerns and allows us to establish correctness properties.

Full Text