Abstract
The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet, as many employees have transitioned to working from home. Furthermore, with the increase in the adoption of encrypted data transmission by many people who tend to use a Virtual Private Network (VPN) or Tor Browser (dark web) to keep their data privacy and hidden, network traffic encryption is rapidly becoming a universal approach. This affects and complicates the quality of service (QoS), traffic monitoring, and network security provided by Internet Service Providers (ISPs), particularly for analysis and anomaly detection approaches based on the network traffic’s nature. The method of categorizing encrypted traffic is one of the most challenging issues introduced by a VPN as a way to bypass censorship as well as gain access to geo-locked services. Therefore, an efficient approach is especially needed that enables the identification of encrypted network traffic data to extract and select valuable features which improve the quality of service and network management as well as to oversee the overall performance. In this paper, the classification of network traffic data in terms of VPN and non-VPN traffic is studied based on the efficiency of time-based features extracted from network packets. Therefore, this paper suggests two machine learning models that categorize network traffic into encrypted and non-encrypted traffic. The proposed models utilize statistical features (SF), Pearson Correlation (PC), and a Genetic Algorithm (GA), preprocessing the traffic samples into net flow traffic to accomplish the experiment’s objectives. The GA-based method utilizes a stochastic method based on natural genetics and biological evolution to extract essential features. The PC-based method performs well in removing different features of network traffic. With a microsecond per-packet prediction time, the best model achieved an accuracy of more than 95.02 percent in the most demanding traffic classification task, a drop in accuracy of only 2.37 percent in comparison to the entire statistical-based machine learning approach. This is extremely promising for the development of real-time traffic analyzers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.