Abstract

A structured overlay technology has the advantages for fault tolerance and computation resource (i.e., node) discovery in distributed data storage and its computation platform, however, these strengths are only guaranteed on stable environment that node failures do not occur frequently. To deal with the environment, many advanced schemes based on the well-known node failure information propagation scheme are proposed, which stabilizes the platform by quickly handling node failures. In the existing scheme, a computation node propagates a node-failure information when the node detect its failure. However, the existing scheme needs stateful maintenance against propagation targets; in other words, it must maintain the network connections of both the propagation target nodes and the nodes held on the general overlay. The nodes then exhaust the machine resources (e.g., CPU, memory, network bandwidth) for the connection management and cannot concentrates on their own tasks, such as data analysis or its storage application. To resolve this problem, I propose a stateless node-failure information propagation scheme, which propagates a node failure at the speed of the existing scheme but without requiring maintenance of the propagation target connections. In the proposed scheme, each computational node can effectively utilize the machine resources for its own task. Instead of retaining the propagation targets, my scheme estimates the propagation targets after detecting a node failure. I analyzed the estimation accuracy of a simple propagation model, which guarantees effective propagation. The accuracy was found to depend on the overlay distance between the failed node and the propagator node. Based on this observation, my scheme adjusts the keep-alive interval to bias the detection of closer node failures. In a simulation evaluation, the detection delay of the proposed stateless propagation was similar to that of the stateful propagation scheme, but delivered superior maintenance cost and scalability.

Highlights

  • A large scale deep learning architectures [1]–[3] and distributed key value store are famous use-cases of distributed computing technologies [4], [5]

  • A source node sends a SY N message to a target node contained in the routing table and confirms the status of the node by checking its response to an ACK message

  • Schemes have their own advantages and disadvantages. The former scheme adjusts the keep-alive interval based on the node behaviors, which improves the detection delay and number of waste maintenance messages only when the node behaviors accord with the assumed model behaviors

Read more

Summary

INTRODUCTION

A large scale deep learning architectures [1]–[3] and distributed key value store are famous use-cases of distributed computing technologies [4], [5]. A structured overlay network satisfies the stability and connectivity requirements by providing an effective routing function for the distributed computing platforms On such a network, the look up of a target node requires only O(log N ) messages [6]–[10]. A node detecting a node failure propagates the failure information to the other nodes containing the failed node state in their routing tables. The propagation scheme adopts a statefull mechanism that forces a node to permanently maintain/update its routing table and its propagation targets (i.e., back pointers). I propose an effective “stateless" propagation scheme of node- failure information that detects an early node failure without maintaining the propagation targets.

RELATED WORK
ALGORITHM OF THE STATELESS PROPAGATION SCHEME
KEEP-ALIVE INTERVAL ADJUSTMENT
PERFORMANCE EVALUATION
SCALABILITY
SUMMARY OF THE EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call