Network measurement provides rich data for network monitoring, control, and management. In-band network telemetry (INT) is a new network measurement technology that uses normal data packet to collect network information hop-by-hop. However, the design and implementation of INT protocol cannot do anything about packet loss: (1) The end-to-end telemetry mechanism makes INT unable to detect packet loss; (2) Since data packets may be lost due to various reasons, INT telemetry information will inevitably be lost. In summary, INT system by itself is unreliable. Incomplete telemetry data will seriously affect the performance of upper-layer network telemetry applications. In this paper, we present our successful experience in INT packet loss monitoring. We design, implement, and open source a powerful packet loss monitoring system for INT, called LossSight. The functions of LossSight include the detection of packet loss events, the deduction of the time and location of the losses, the diagnose of the root cause of the losses, and the recovery of the lost INT information. Experiment results show that LossSight provides excellent performance and extremely low overhead, including detection accuracy and diagnostic precision close to 100%, and detection latency of just milliseconds. In particular, LossSight uses a generative adversarial network to recover lost telemetry information, with excellent accuracy and reliability. LossSight has been running stably in the supercomputing interconnection environment of the National Supercomputing Center in Jinan. We suggest that all INT applications that require reliable telemetry information should be implemented based on LossSight.
Read full abstract