Abstract
Fault tolerance is one of the most crucial concerns in distributed systems. Flout tolerance system is very difficult to implement due to its dynamic nature and complex services. Several research efforts consare istently being made to implement that tolerance in a distributed system. Some recent surveys try to incorporate the several fault tolerance architectures and methodologies proposed for a distributed system. This paper gives a systematic and comprehensive interpretation of different fault types, their causes, and various fault-tolerance approaches used in a distributed system. The paper presents a broad survey of various fault tolerance frameworks in the context of their basic approaches, fault applicability, and other key features. we investigate the different techniques of fault tolerance which is used in a distributed and scalable system. Scalability is an important factor in distributed Systems. It describes the ability of the system to dynamically adjust its own computing performance by changing available computing resources and scheduling methods. The focus of this paper is on types of faults occurring in the system and fault detection techniques. A fault can occur in the system due to the link failure or for any other reason. An appropriate fault detection technique can avoid a loss and save from system failure. The main objective of the fault-tolerant computer system is to continue operating uninterrupted despite the failure of one or more of its components. In the early day’s computer systems were not distributed and they also did not share resources. Now, most of the computers are distributed. They work independently on a common task. So, if one system gets any fault then the other systems will take over the computation of the fault system. The user will not get any issues with his tasks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have