The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfuscation Environments

Hong-Yen Chen,Tsung-Nan Lin

doi:10.1109/access.2021.3087528

Abstract

As encrypted traffic grows, network flow classification has become a significant issue because of the impossibility to parse the payload in an encrypted packet. A possible packet sniffing location for organizations is an under control gateway between intranet and internet to inspect network traffic. However, when an intranet user uses an identity obfuscation protocol such as VPN or TOR, the packet IP and port would be rewritten to preserve user privacy. The same user's packet sniffed between a user and TOR entry node/VPN proxy always has the same 5-tuples (packets with the same source IP, destination IP, source port, destination port, and IP protocol defined as flow). Thus, we cannot rely on the 5-tuples rule to split traffic into flows. This challenge is called the “only one flow problem” and poses an obstacle for flow classification. A previous solution uses timeout value to determine flow separation points to address this issue. However, the predefined static time threshold cannot fit all user habits, which leads to separation errors. To overcome timeout limitations, we propose a flexible method called AI-FlowDet by leveraging the scene change concept and a CNN model to find behavior change points of traffic based on learning data. AI-FlowDet can apply to the traffic of the identity obfuscation protocols. Next, we propose 294 size-based and direction-based features that can be used with AI-FlowDet to evaluate flow type classification performance. Every experiment leverages different machine learning algorithms. The results show that AI-FlowDet with the proposed features can achieve 98.5% weighted accuracy, which is increased by 12.6% versus the previous timeout method with baseline features. We proved that the proposed splitting methods for the only one flow problem and proposed features for flow type classification are effective based on the good results obtained for both the VPN and TOR datasets.

Highlights

W ITH the rise of 5G networks, hacking and information security incidents have escalated
When using AI-FlowDet with sizebased and direction-based (S&D) features for flow application type classification in an identity obfuscation environment, 98.5% accuracy can be achieved with the multilayer perceptron (MLP) algorithm
When network traffic is generated from an identity obfuscation environment, it causes the only one flow problem, and we cannot leverage the 5-tuple to split traffic into flows

Summary

Introduction

W ITH the rise of 5G networks, hacking and information security incidents have escalated. Security researchers will generally obtain information from endpoint devices (PCs, laptops, mobile phones) or network devices (routers, switches) for inspection. Most recent applications do not use the originally specified port, which causes the port-based method to become unreliable. We first describe the characteristics of the identity obfuscation protocol and the problem caused by it. ENCRYPTION + PROXY The tunneling protocol is a communications protocol that allows data to be transferred from one network to another. This protocol can be divided into two categories based on the purpose of use, namely, encryption and proxy. We define TOR and VPN as identity obfuscation environments, which simultaneously have encryption and proxy characteristics

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfuscation Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Parallel Coordinates Version of Time-Tunnel (PCTT) and Its Combinatorial Use for Macro to Micro Level Visual Analytics of Multidimensional Data
Yoshihiro Okada
-
Yoshihiro OkadaYoshihiro Okada
01 Jan 2015
01 Jan 2015

Network Data Visualization Using Parallel Coordinates Version of Time-tunnel with 2Dto2D Visualization for Intrusion Detection
Y Okada
-
Y OkadaY Okada
01 Mar 2013
01 Mar 2013

A Hash Algorithm for IP Flow Measurement
Guang Cheng
Journal of Software | VOL. 16
Guang ChengGuang Cheng
01 Jan 2004
Journal of Software | VOL. 16

Simulation-based Analysis of Network Rules Matching
David M Nicol
-
David M NicolDavid M Nicol
29 May 2019
29 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfuscation Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access