Generative transformer framework for network traffic generation and classification

Radion F Bikmukhamedo,Adel F Nadeev

doi:10.36724/2072-8735-2020-14-11-64-71

Abstract

for generating and classification tasks. Only packet size and inter-packet time sequences are used as flow features to unify the inputs for the two tasks. The source feature space is scaled and clustered with K-Means to form discrete sequences as model inputs. The model can be trained in two modes: (i) autoregressively, for network traffic generating, where the first token of training sequence represents a flow class, (ii) as a network flow classifier. The evaluation of generated traffic by means of Kolmogorov-Smirnov statistic demonstrated that its quality is on par with the first-order Markov chain, which was trained on each traffic class independently. The metric measured distances between source and generated empirical cumulative distributions of such parameters as packet size, inter-arrival time, throughput and number of packets per flow in directions to and from traffic origin. It was shown that enriching the dataset with external traffic from different domain improves quality of the generated traffic on target classes. The experiment results showed positive influence of generative pre-training on quality of the traffic classification task. In case of using the pre-trained model as a feature extractor for a linear algorithm, the quality was close to Random Forest trained on the raw sequences. When all model parameters are trained, the classifier outperforms the ensemble on average by 4% according to the F1-macro metric.

Full Text