Abstract

Failures are not uncommon in production data center networks (DCNs) nowadays, and it takes long time for the network to recover from a failure and find new forwarding paths, significantly impacting real time and interactive applications at the upper layer. The slow failure recovery is due to two primary reasons. First, there lacks immediate backup paths for downward links in DCN with multi-rooted tree topology. Second, distributed routing protocols in DCN take time to converge after failures. In this paper, we present a fault-tolerant DCN solution, called F <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> Tree, that can significantly improve the failure recovery time in current DCNs, only through a small amount of link rewiring and switch configuration changes. Because F2Tree does not change any existing software or hardware, it is readily deployed in production DCNs, where other existing proposals fail to achieve. Through testbed and emulation experiments, we show that F2Tree can greatly reduce the time of failure recovery by 78%. Our experimental results also show that, for partition-aggregate applications (popular in DCN) under various failure conditions, F2Tree reduces the ratio of deadline-missing requests by more than 96% compared to current DCNs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.