Performance improvement in Distributed Systems through Replication and Checkpointing

Abhishek Raghuvanshi,Sourabh Dave

doi:10.5120/5800-8039

Abstract

In distributed system fault tolerance is an important issue. Many applications executing in present scenario with several processors have to face with problems related to consistency and availability. Complete process will fail with the failure of a single component. There are many existing approaches which assure reliable execution, are based on fault tolerance mechanisms. We talk about the basic concept of fault tolerance, which is to make a network system tolerant enough to work properly, may be with a little low efficiency, in case of any fault. A good fault tolerant system will avoid further failures. After transient failures main problem is to bring a distributed system to a consistent state. We worked on two parts of this problem by providing a distributed system to create consistent checkpoints as well as replication is focused. We have given an algorithm for replication and implemented it in Java RMI. We have done two things: First the checkpoints are replicated and Second, Servers are replicated on different system using that algorithm.

Full Text