Network monitoring systems are a key building block in today's networks. They all follow a common framework where measurement data from network elements is aggregated at a central collector for network-wide visibility. When designing network monitoring systems, two key properties have to be taken into account: (1) efficiency, to minimize the communication overhead from network elements to the collector; (2) high-fidelity, to faithfully represent the network status. However, in presence of network dynamics, tracking the right operating point to ensure both high fidelity and efficiency is hard and we observe that prior monitoring approaches trade off one for the other. In this paper, we show that it is possible to satisfy both these properties with NetGSR, a new deep learning based solution we introduce that reconstructs the fine-grained behavior of network status at the collector while requiring low resolution measurement data from network elements. This is achieved through a combination of a new custom-tailored conditional deep generative model (DistilGAN), and a new feedback mechanism (Xaminer) based on model uncertainty estimation and denoising that allows the collector to adjust the sampling rate for measurement data from network elements, at run-time. We extensively evaluate NetGSR using three different network scenarios with corresponding real-world network monitoring datasets as well as two downstream use cases. We show that NetGSR can faithfully reconstruct fine-grained network status with 25x greater measurement efficiency than prior approaches while requiring only few ms of inference time at the collector.
Read full abstract