Abstract

Network traffic analytics has become a crucial task in order to better understand and manage network resources, especially in the network softwarization era where the implementation of this concept can be done easily with network function virtualization. Currently, many approaches have been proposed to improve the performance of traffic classification. However, as new types of traffic emerge every day (and they are generally not labeled), this opens a new challenge to be handled. Moreover, the question of how to accurately classify the traffic using a limited amount of labeled data or partially labeled data is also another important concern. In fact, labeling data is often difficult and time-consuming. In order to solve the previously described issues, we reformulate traffic classification into a semi-supervised learning where both supervised learning (using labeled data) and unsupervised learning (no label data) are combined. To do so, this paper presents a stacked sparse autoencoder (SSAE) based semi-supervised deep-learning model for traffic classification. The main motivations of this approach are: (i) unlabeled data is often abundant and easily available; (ii) classification performance of the whole model can be greatly improved when a large amount of unlabeled traffic is included in the training process; (iii) there is a limit to how much human effort can be thrown at the labeling problem. To investigate the performance of our approach, an empirical study has been conducted on a real dataset and results indicate that using a large amount of unlabeled data in the SSAE pre-trained phase can improve significantly the classification performance of the whole model. Furthermore, the proposed approach is compared against other representative machine-learning and deep-learning models, which are Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Multi-Layer Perceptron (MLP), eXtreme Gradient Boosting (XGBoost), and Autoencoder.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call