Abstract

Network traffic classification is a significant and problematic aspect of network resource management arising from an investigation of network developments, planning, and design for 5G and beyond. Recently, traffic investigation systems for network monitoring and user access restrictions to Virtual Private Networks (VPN) and non-Virtual Private Networks (non-VPN) have gained widespread attention. In this paper, different algorithms for classifying and detecting VPN traffic are considered. A few existing machine learning procedures were tested concerning their performance in network traffic classification and security. The purpose is to improve Precision, Recall, and F1-score in VPN Network Traffic using Ensemble Classifiers. Therefore, the parameters of the ensemble classifier were changed to obtain high Precision, Recall, and F1-score. Bagging Decision Tree and Gradient Boosting algorithms were used for classification which produced promising results when compared to single classifiers like k-Nearest Neighbors (kNN), Multilayer Perceptron (MLP), and Decision Tree. The proposed classifier demonstrates recognition accuracy on a test sample of up to 93.80% which outperforms all other single algorithms used in previous work. The MLP, Random Forest (RF), and Gradient Boosting (GB) algorithms had almost identical performance in all experiments. Furthermore, the proposed classifiers are found to perform better when the network traffic flows are generated using different values of time parameters (timeout). Our results show that the ensemble algorithms (Random Forest and the Gradient Boosting) outperform the single machine learning classifier previously used by other researchers, and we achieved the highest accuracy with the random forest classifier with better results while using non-VPN traffic and VPN traffic. The novelty lies in the application of an ensemble algorithm to secure a network traffic classification performed in comparison with single classifiers to determine Accuracy, Precision, and F1-score of a given dataset, contrary to the known process of selection of features and generation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call