Stateless Node Failure Information Propagation Scheme for Stable Overlay Networks

Kimihiro Mizutani

doi:10.1109/access.2021.3090028

Abstract

A structured overlay technology has the advantages for fault tolerance and computation resource (i.e., node) discovery in distributed data storage and its computation platform, however, these strengths are only guaranteed on stable environment that node failures do not occur frequently. To deal with the environment, many advanced schemes based on the well-known node failure information propagation scheme are proposed, which stabilizes the platform by quickly handling node failures. In the existing scheme, a computation node propagates a node-failure information when the node detect its failure. However, the existing scheme needs stateful maintenance against propagation targets; in other words, it must maintain the network connections of both the propagation target nodes and the nodes held on the general overlay. The nodes then exhaust the machine resources (e.g., CPU, memory, network bandwidth) for the connection management and cannot concentrates on their own tasks, such as data analysis or its storage application. To resolve this problem, I propose a stateless node-failure information propagation scheme, which propagates a node failure at the speed of the existing scheme but without requiring maintenance of the propagation target connections. In the proposed scheme, each computational node can effectively utilize the machine resources for its own task. Instead of retaining the propagation targets, my scheme estimates the propagation targets after detecting a node failure. I analyzed the estimation accuracy of a simple propagation model, which guarantees effective propagation. The accuracy was found to depend on the overlay distance between the failed node and the propagator node. Based on this observation, my scheme adjusts the keep-alive interval to bias the detection of closer node failures. In a simulation evaluation, the detection delay of the proposed stateless propagation was similar to that of the stateful propagation scheme, but delivered superior maintenance cost and scalability.

Highlights

A large scale deep learning architectures [1]–[3] and distributed key value store are famous use-cases of distributed computing technologies [4], [5]
A source node sends a SY N message to a target node contained in the routing table and confirms the status of the node by checking its response to an ACK message
Schemes have their own advantages and disadvantages. The former scheme adjusts the keep-alive interval based on the node behaviors, which improves the detection delay and number of waste maintenance messages only when the node behaviors accord with the assumed model behaviors

Summary

INTRODUCTION

A large scale deep learning architectures [1]–[3] and distributed key value store are famous use-cases of distributed computing technologies [4], [5]. A structured overlay network satisfies the stability and connectivity requirements by providing an effective routing function for the distributed computing platforms On such a network, the look up of a target node requires only O(log N ) messages [6]–[10]. A node detecting a node failure propagates the failure information to the other nodes containing the failed node state in their routing tables. The propagation scheme adopts a statefull mechanism that forces a node to permanently maintain/update its routing table and its propagation targets (i.e., back pointers). I propose an effective “stateless" propagation scheme of node- failure information that detects an early node failure without maintaining the propagation targets.

RELATED WORK

ALGORITHM OF THE STATELESS PROPAGATION SCHEME

KEEP-ALIVE INTERVAL ADJUSTMENT

PERFORMANCE EVALUATION

SCALABILITY

SUMMARY OF THE EXPERIMENTS

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Stateless Node Failure Information Propagation Scheme for Stable Overlay Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

ANN Based Novel Approach to Detect Node Failure in Wireless Sensor Network
Sundresan Perumal ... Suresh Ponnan
Computers, materials & continua | VOL. 69
Sundresan Perumal, et. al.Sundresan Perumal ... Suresh Ponnan
01 Jan 2020
Computers, materials & continua | VOL. 69

PUFF: A Passive and Universal Learning-based Framework for Intra-domain Failure Detection
Lianjin Ye ... Jingyu Xiao
-
Lianjin Ye, et. al.Lianjin Ye ... Jingyu Xiao
29 Oct 2021
29 Oct 2021

Founsure 1.0: An erasure code library with efficient repair and update features
Şuayb Ş. Arslan
SoftwareX | VOL. 13
Şuayb Ş. ArslanŞuayb Ş. Arslan
01 Jan 2020
SoftwareX | VOL. 13

On the Robustness of Distributed Computing Networks
Jianan Zhang ... Eytan Modiano
-
Jianan Zhang, et. al.Jianan Zhang ... Eytan Modiano
01 Mar 2019
01 Mar 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stateless Node Failure Information Propagation Scheme for Stable Overlay Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions