Abstract

Failure recovery is a fundamental task of the dependable systems needed to achieve fault-tolerant communications, smooth operation of system components and a comfortable user interface. Tree topologies are fragile, yet they are quite popular structures in computer systems. The term survivable tree denotes the capability of the tree network to deliver messages even in the presence of failures. In this paper, we analyze the characteristics of large-scale overlay survivable trees and identify the requirements for general-purpose failure recovery mechanisms in such an environment. We outline a generic failure recovery platform for preplanned tree restoration which meets those requirements, and we focus primarily on its completeness and correctness properties. The platform is based on bypass rings and it uses a bypass routing algorithm to ensure completeness, and specialized leader election to guarantee correctness. The platform supports multiple, on-line and on-the-fly recovery, provides an optional level of fault-tolerance, protection selectivity and optimization capability. It is independent of the the protected tree type (regarding traffic direction, number of sources, etc.) and forms a basis for application-specific fragment reconnection.

Highlights

  • Failure recovery is a fundamental task of the dependable systems needed to achieve fault-tolerant communications, smooth operation of system components and a comfortable user interface

  • We outline a failure recovery platform for preplanned tree restoration based on bypass rings ([2], [3]) – cyclical redundant structures to be used in the event of failure to locate and reconnect the tree fragments

  • This is the task of the leader link election (LLE) process, which is based on comparing the hierarchical identifiers of the fragments ([2])

Read more

Summary

Introduction

Failure recovery is a fundamental task of the dependable systems needed to achieve fault-tolerant communications, smooth operation of system components and a comfortable user interface. Popular distributed applications providing data sharing, content distribution or stream data delivery services include many different computers, often at distant geographical locations. To communicate between their nodes, these applications build tree-topology overlay structures to connect the nodes and distribute information. The failure recovery schemes for overlay trees use the underlying network to build a completely new tree or to restore the tree keeping its original structure. We outline a failure recovery platform for preplanned tree restoration based on bypass rings ([2], [3]) – cyclical redundant structures to be used in the event of failure to locate and reconnect the tree fragments.

Related work
Main issues of overlay failure recovery
Survivable trees
Bypass ring platform
Bypass routing
Leader link election
Fragment reconnection
Discussion
10 Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.