Oracle-based data generation for highly efficient digital twin network training

Alexei Gaissinski Alexei Gaissinski,Eli Kravchik Eli Kravchik,Eliyahu Sason Eliyahu Sason,Pavel Kisilev Pavel Kisilev,Yackov Lubarsky Yackov Lubarsky

doi:10.52953/aweu6345

Abstract

Recent advances in Graph Neural Networks (GNNs) has opened new capabilities to analyze complex communication systems. However, little work has been done to study the effects of limited data samples on the performance of GNN-based systems. In this paper, we present a novel solution to the problem of finding an optimal training set for efficient training of a RouteNet-Fermi GNN model. The proposed solution ensures good model generalization to large previously unseen networks under strict limitations on the training data budget and training topology sizes. Specifically, we generate an initial data set by emulating the flow distribution of large networks while using small networks. We then deploy a new clustering method that efficiently samples the above generated data set by analyzing the data embeddings from different Oracle models. This procedure provides a very small but information-rich training set. The above data embedding method translates highly heterogeneous network samples into a common embedding spac, wherein the samples can be easily related to each other. The proposed method outperforms state-of-the-art approaches, including the winning solutions of the 2022 Graph Neural Networking challenge.

Full Text