Fast network recoverability from hard and soft failures is crucial for network operators to deliver uninterrupted services. Streaming telemetry has been studied as a solution for enabling fast and accurate failure detection in optical networks. However, significant delay is incurred when relying on a centralized entity (e.g., software-defined network controller) to collect, process, and act on telemetry data. Programmable switches (e.g., P4-based) allow telemetry data to be processed at line speed, enabling local on-device (distributed) decisions. These devices can be used to deploy quick and local mitigation to failures while a global solution is being computed on a longer time scale. However, designing network-wide streaming telemetry with distributed decisions remains an open challenge. In this work, we specify the joint optimization of packet-optical networks with on-device failure recovery, considering multiple aspects of the problem. The problem is modeled using linear programming and solved for multiple network realizations. The solutions can be used to program each switch in the network to detect failures and quickly recover the traffic. Results show that the proposed model decreases the required number of register entries to store telemetry data while assuring high recoverability and a minimized number of wavelengths.
Read full abstract