Checkpoint Placement Research Articles

Real-time computer systems are often used in harsh environments, such as aerospace, and in industry. Such systems are subject to many transient faults while in operation. Checkpointing enables a reduction in the recovery time from a transient fault by saving intermediate states of a task in a reliable storage facility, and then, on detection of a fault, restoring from a previously stored state. The interval between checkpoints affects the execution time of the task. Whereas inserting more checkpoints and reducing the interval between them reduces the reprocessing time after faults, checkpoints have associated execution costs, and inserting extra checkpoints increases the overall task execution time. Thus, a trade-off between the reprocessing time and the checkpointing overhead leads to an optimal checkpoint placement strategy that optimizes certain performance measures. Real-time control systems are characterized by a timely, and correct, execution of iterative tasks within deadlines. The reliability is the probability that a system functions according to its specification over a period of time. This paper reports on the reliability of a checkpointed real-time control system, where any errors are detected at the checkpointing time. The reliability is used as a performance measure to find the optimal checkpointing strategy. For a single-task control system, the reliability equation over a mission time is derived using the Markov model. Detecting errors at the checkpointing time makes reliability jitter with the number of checkpoints. This forces the need to apply other search algorithms to find the optimal number of checkpoints. By considering the properties of the reliability jittering, a simple algorithm is provided to find the optimal checkpoints effectively. Finally, the reliability model is extended to include multiple tasks by a task allocation algorithm.

Read full abstract

Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to making such systems dependable is to vote on redundant processors executing multiple copies of the same task is described. The processors which make up such voted systems are subjected not only to independently occurring permanent and transient failure, but also to correlated transients brought about by electromagnetic interference from the operating environment. To counteract these transients, checkpointing and time redundancy are required, in addition to processor redundancy. This work analyzes the use of time and device redundancy in systems subject to correlated failure. The tradeoffs in checkpoint placement in such a system are found to be considerably different from those for non-redundant systems without real-time constraints. The authors compare fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software. >

Read full abstract

Checkpoint Placement Research Articles

Articles published on Checkpoint Placement

A variational calculus approach to optimal checkpoint placement

An optimal checkpointing-strategy for real-time control systems under transient faults

Checkpoint Placement for Fault-Tolerant Real-Time Systems

An on-line algorithm for checkpoint placement

Reliability of checkpointed real-time systems using time redundancy

Optimal Checkpointing of Real-Time Tasks

Optimization criteria for checkpoint placement

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Checkpoint Placement Research Articles

Articles published on Checkpoint Placement

A variational calculus approach to optimal checkpoint placement

An optimal checkpointing-strategy for real-time control systems under transient faults

Checkpoint Placement for Fault-Tolerant Real-Time Systems

An on-line algorithm for checkpoint placement

Reliability of checkpointed real-time systems using time redundancy

Optimal Checkpointing of Real-Time Tasks

Optimization criteria for checkpoint placement