Abstract
As encrypted traffic grows, network flow classification has become a significant issue because of the impossibility to parse the payload in an encrypted packet. A possible packet sniffing location for organizations is an under control gateway between intranet and internet to inspect network traffic. However, when an intranet user uses an identity obfuscation protocol such as VPN or TOR, the packet IP and port would be rewritten to preserve user privacy. The same user's packet sniffed between a user and TOR entry node/VPN proxy always has the same 5-tuples (packets with the same source IP, destination IP, source port, destination port, and IP protocol defined as flow). Thus, we cannot rely on the 5-tuples rule to split traffic into flows. This challenge is called the “only one flow problem” and poses an obstacle for flow classification. A previous solution uses timeout value to determine flow separation points to address this issue. However, the predefined static time threshold cannot fit all user habits, which leads to separation errors. To overcome timeout limitations, we propose a flexible method called AI-FlowDet by leveraging the scene change concept and a CNN model to find behavior change points of traffic based on learning data. AI-FlowDet can apply to the traffic of the identity obfuscation protocols. Next, we propose 294 size-based and direction-based features that can be used with AI-FlowDet to evaluate flow type classification performance. Every experiment leverages different machine learning algorithms. The results show that AI-FlowDet with the proposed features can achieve 98.5% weighted accuracy, which is increased by 12.6% versus the previous timeout method with baseline features. We proved that the proposed splitting methods for the only one flow problem and proposed features for flow type classification are effective based on the good results obtained for both the VPN and TOR datasets.
Highlights
W ITH the rise of 5G networks, hacking and information security incidents have escalated
When using AI-FlowDet with sizebased and direction-based (S&D) features for flow application type classification in an identity obfuscation environment, 98.5% accuracy can be achieved with the multilayer perceptron (MLP) algorithm
When network traffic is generated from an identity obfuscation environment, it causes the only one flow problem, and we cannot leverage the 5-tuple to split traffic into flows
Summary
W ITH the rise of 5G networks, hacking and information security incidents have escalated. Security researchers will generally obtain information from endpoint devices (PCs, laptops, mobile phones) or network devices (routers, switches) for inspection. Most recent applications do not use the originally specified port, which causes the port-based method to become unreliable. We first describe the characteristics of the identity obfuscation protocol and the problem caused by it. ENCRYPTION + PROXY The tunneling protocol is a communications protocol that allows data to be transferred from one network to another. This protocol can be divided into two categories based on the purpose of use, namely, encryption and proxy. We define TOR and VPN as identity obfuscation environments, which simultaneously have encryption and proxy characteristics
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.