Abstract

This paper describes two-fold approach towards utilizing Triple Modular Redundancy (TMR) in Wireless Adhoc Network (AdocNet). A distributed checkpointing and recovery protocol is proposed. The protocol eliminates useless checkpoints and helps in selecting only dependent processes in the concerned checkpointing interval, to recover. A process starts recovery from its last checkpoint only if it finds that it is dependent (directly or indirectly) on the faulty process. The recovery protocol also prevents the occurrence of missing or orphan messages. In AdocNet, a set of three nodes (connected to each other) is considered to form a TMR set, being designated as main, primary and secondary. A main node in one set may serve as primary or secondary in another. Computation is not triplicated, but checkpoint by main is duplicated in its primary so that primary can continue if main fails. Checkpoint by primary is then duplicated in secondary if primary fails too.

Highlights

  • Distributed systems that execute processes on different nodes connected by a communication network [6] are prone to failure

  • This concept of Triple Modular Redundancy (TMR) is utilized in this work as a measure for achieving fault tolerance in a wireless adhoc network (AdhocNet) where a group of three nodes, known as mobile hosts (MH) form the three replicas

  • The checkpointing algorithm proposed in this paper constructs consistent checkpoints in a distributed manner

Read more

Summary

INTRODUCTION

Distributed systems that execute processes on different nodes connected by a communication network [6] are prone to failure. This concept of TMR is utilized in this work as a measure for achieving fault tolerance in a wireless adhoc network (AdhocNet) where a group of three nodes, known as mobile hosts (MH) form the three replicas. Fault tolerance may be achieved by periodically using stable storage of the MHs to save the process‘ states, better known as checkpoints, during failure-free execution. Processes take local checkpoints after being notified by the initiator excepting special cases described in later sections The processes synchronize their activities of the current checkpointing interval before committing their checkpoints. This paper describes that any global checkpoint taken in the above-mentioned fashion in the present system is consistent and eliminates taking unnecessary checkpoints and the system has to roll back only to the last saved state in case of a failure.

RELATED WORKS
SYSTEM MODEL AND ASSUMPTIONS
The Algorithm
Message m recorded received
Approach to Recovery
Algorithm for Detecting Recovery
Lost messages
WORKING OF ADHOCNET-BASED TMR
Checkpointing
Recovery
An Example Scenario
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.