Abstract

With the rapid growth of backbone networks and data center networks, ensuring network robustness under various failure scenarios has become a key challenge in network design. The combinatorial nature of failure scenarios in data plane, control plane, and management plane seriously challenges existing practice on robust network design, which often requires verifying the designed network's performance by enumerating all possible failure combinations. Meanwhile, machine learning (ML) has been applied to many networking problems and has shown tremendous success. In this article, we show a general approach to leveraging machine learning to support robust network design. First, we give a selective overview of current work on robust network design and show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. Then we propose a function approximation of the common kernel based on graph attention network (GAT) to efficiently evaluate the impact of various potential failure scenarios and identify critical failures that may have significant consequences. The function approximation allows us to obtain new models of three important robust network design problems and to solve them efficiently by evaluating the solutions against a pruned set of critical failures. We evaluate our approach in the three use cases and demonstrate significant reduction in time-tosolution with minimum performance gap. Finally, we discuss how the proposed framework can be applied to many other robust network design problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call