Abstract

Achieving processor cooperation in the presence of faults is a major problem in distributed systems. Popular paradigms such as agreement have been studied principally in the context of a complete Indeed, Dolev (J. Algorithms, 3 (1982), pp. 14-30) and Hadzilacos (Issues of Fault Tolerance in Concurrent Computations, Ph.D. thesis, Harvard University, Cambridge, MA, 1984) have shown that fl(t) connectivity is necessary if the requirement is that all nonfaulty processors decide unanlmously, where is the number of faults to be tolerated. We believe that in forseeable technologies the number of faults will grow with the size of the network while the degree will remain practically fixed. We therefore raise the question whether it is possible to avoid the connectivity requirements by slightly lowering our expectations. In many practical situations we may be willing to lose some correct processors and settle for cooperation between the vast majority of the processors. Thus motivated, we present a general simulation technique by which vertices (processors) in almost any network ofbounded degree can simulate an algorithm designed for the complete The simulation has the property that although some correct processors may be cut off from the majority of the network by faulty processors, the vast majority of the correct processors will be able to communicate among themselves undisturbed by the (arbitrary) behavior of the faulty nodes. We define a new paradigm for distributed computing, almost-everywhere agreement, in which we require only that almost all correct processors reach consensus. Unlike the traditional agreement problem, almost-everywhere agreement can be solved on networks of bounded degree. Specifically, we can simulate any sufficiently resilient agreement algorithm on a network ofbounded degree using our communi- cation scheme described above. Although we lose some correct processors, effectively treating them as faulty, the vast majority of correct processors decide on a common value. 1. Preliminaries. In 1982 Dolev (D) published the following damning result for distributed computing: Byzantine agreement is achievable only ifthe number of faulty processors in the system is less than one-half of the connectivity of the system's network. Even in the absence of malicious failures connectivity + 1 is required to achieve agreement in the presence of faulty processors (H). The results are viewed as damning because of the fundamental nature of the agreement problem. In this problem each processor begins with an initial value drawn from some domain V of possible values. At some point during the computation, during which processors repeatedly exchange messages and perform local computations, each processor must irreversibly decide on a value, subject to two conditions. No two correct processors may decide on different values, and if all correct processors begin with the same value v, then v must be the common decision value. (See (F) for a survey of related problems.) The ability to achieve this type of coordina- tion is important in a wide range of applications, such as database management, fault-tolerant analysis of sensor readings, and coordinated control of multiple agents. A simple corollary of the results of Dolev and Hadzilacos is that in order for a system to be able to reach agreement in the presence of up to faulty processors, every processor must be directly connected to at least fl(t) others. Such high connectivity, while feasible in a small system, cannot be implemented at reasonable cost in a large system. As technology improves, increasingly large distributed systems and parallel com-

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.