Abstract

Fault tolerant unit (FTU) management is the most important process for a dependable system with replicated components. A protocol is necessary to properly manage error detection and fault containment regions in a safety-critical application on a CAN network. A distributed protocol is detailed with an efficient and accurate method to switch between a primary node that fails and a number of replicated nodes. Upon node failure, the protocol contains relevant information to accurately determine the new primary node in the system. This causes failed components to have a fail silent behavior. The result is that when a component fails there is no loss of data on the network. With the protocol engaged, there is a seamless execution of distributed applications in systems with multiple replicated nodes under the presence of failures. The protocol has been implemented, tested, and evaluated as part of a distributed safety-critical architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call