Weakly Supervised Learning for Network Traffic Classification

Onur Barut,Peilong Li,Tong Zhang

doi:10.1109/nas55553.2022.9925450

Abstract

With advances in deep learning methods and an enormous amount of network traffic data, training deep neural networks for malware traffic classification directly with the raw traffic data has gained popularity and success. Obtaining labeled data to train deeper models, on the other hand, has recently become a significant challenge. Weakly supervised learning approaches that aim to improve the classification accuracy of classifiers previously trained with a small quantity of labeled data using widely available but unlabeled data have grown more popular as a solution to this problem. We propose employing several deep models as labeling functions instead of manual definition based on hand-crafted features, such as Snorkel approach, to estimate the labels that will be utilized to tune the classifier to increase its accuracy for network traffic classification using unlabeled data. In a multi-class classification scenario, our findings demonstrate that utilizing deep models to label the unlabeled chunk can enhance accuracy by 1.5 % and F1-score by 5%.

Full Text