Abstract

Having a precise and stable clock that is still fault tolerant is a fundamental prerequisite in safety critical real-time systems. However, combining redundant independent clock sources to form a unified fault-tolerant clock supply is non-trivial, especially when redundant clock outputs are required – e.g., for supplying the replicated nodes within a TMR architecture through a clock network that does not suffer from a single point of failure. Having these outputs fail independent but still keeping them tightly synchronized is highly desirable, as it substantially eases the design of the overall architecture.In this paper we address exactly this challenge. Our approach extends an existing, ring-oscillator like distributed clock generation scheme by augmenting each of its constituent nodes with a stable clock reference. We introduce the appropriately modified algorithm and illustrate its operation by simulation experiments. These experiments further demonstrate that the four clock outputs of our circuit do not share a single point of failure, have small and bounded skew, remain stabilized to one crystal source during normal operation, do not propagate glitches from one failed clock to a correct one, and only exhibit slightly extended clock cycles during a short stabilization period after a component failure. In addition we give a rigorous formal proof for the correctness of the algorithm on an abstraction level that is close to the implementation.

Highlights

  • Computers are being entrusted with safety-critical functions in a rapidly increasing number of applications, with autonomous vehicles being just one recent example

  • One threat to triple-modular redundant (TMR) architectures is the so-called commonmode failure: If two of the three redundant nodes fail in the same way, the voter will decide for the erroneous result

  • Our envisioned use case is a TMR system whose redundant nodes shall be supplied with a clock that does not constitute a single point of failure

Read more

Summary

INTRODUCTION

Computers are being entrusted with safety-critical functions in a rapidly increasing number of applications, with autonomous vehicles being just one recent example. While a lot of alternative fault-tolerance techniques are available, (coarse-grain) triple-modular redundant (TMR) architectures have gained much popularity This is partly due to the high error detection coverage they can attain through their “output centric” approach: No matter what the actual cause may be – the voter just takes the majority of matching outputs and masks the faulty one. Another beneficial feature of TMR is its simplicity: The redundant nodes can be off-the-shelf components (or IP modules) without any special features or extensions. The whole architecture is operated in lock-step, which significantly simplifies the voter At this point the clock potentially becomes a single point of failure, unless it can be furnished with fault tolerance as well.

AND RELATED WORK
REQUIREMENTS
Starting point
Proposed extension with stable sources
Formalization of the modified algorithm
Steady-state operation of the extended algorithm
Model and preliminaries
Correctness analysis
EXPERIMENTAL EVALUATION
Steady state operation
Failure of the fastest TS node
Changing frequency
Discussion
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call