Abstract

Soft real-time systems often have to consider both timing and probabilistic fault-tolerance requirements. When checkpointing techniques are used for fault tolerance purposes, the checkpointing frequency unyieldingly affects the system's overall quality measured by an integrated value of system QoS properties, such as availability, task execution time, and task deadline miss probability. In this paper, we first formally analyze the relationships between checkpoint interval and system availability, task execution time, and task deadline miss probability, respectively by considering a Poisson probabilistic fault model. We further define the system's overall quality as a weighted sum of these three QoS measures, from which an optimization problem is formulated to decide the checkpoint interval that maximizes system's overall quality. Also presented in the paper are a prototype implementation of a framework that allows adaptive checkpointing and a set of experiments executed upon the framework that further validate our analytical results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call