Efficient failure detection and consensus at extreme-scale systems

Soma Sekhar Kolisetty,Battula Srinivasa Rao

doi:10.11591/ijece.v12i5.pp5339-5347

Soma Sekhar Kolisetty, Battula Srinivasa Rao

Open Access

https://doi.org/10.11591/ijece.v12i5.pp5339-5347

Copy DOI

Abstract

<span>Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughout academia organizations, business, home, and government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that is gaining popularity for delivering computing resources and services. Distributed systems try to increase its availability in the event of frequent component failures and functioning the system in such scenario is notoriously difficult. In order to identify component failures in the system and achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection and consensus algorithm based on fail-stop type process failures. The proposed algorithm is fault-tolerant to process failures occurring before and during the execution of the algorithm. The proposed algorithm works with the epidemic gossip protocol, which is a randomly generated paradigm of computation and communication that is both fault-tolerant and scalable. A simulation of an extreme-scale information dissemination process shows that global agreement can be achieved. A P2P simulator, PeerSim, is used in the paper to implement and test the proposed algorithm. The proposed algorithm results exhibited high scalability and at the same time detected all the process failures. The status of all the processes is maintained in a Boolean matrix.</span>

Full Text