AbstractThis paper considers an Internet agent system in which a tremendous number of agents operate, frequently appearing and disappearing, and discusses the fault‐tolerant algorithm. Application of the snapshot algorithm to the agent system is considered. The snapshot algorithm is used to view the whole situation (snapshot) of the distributed system. The snapshot algorithm of Chandy and Lamport [2] is considered as a representative snapshot algorithm, in terms of the high efficiency and the simplicity of the procedure. It is not practical, however, to apply their snapshot algorithm to the distributed agent system in which a tremendous number of agents operate. From such a viewpoint, this paper extends the idea of Chandy and Lamport's algorithm and proposes a subsnapshot algorithm, in which the snapshot is taken among the agents who are in the causal relation, through message exchange and agent creation. Then, an efficient rollback algorithm is proposed, which is based on the snapshots taken by the subsnapshot algorithm. In the general rollback algorithm utilizing the snapshot, all agents must roll back. In contrast, in the rollback algorithm proposed in this paper, it suffices that only some agents should roll back. © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 88(12): 43–57, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20208
Read full abstract