Abstract

The increasing ubiquity of network traffic and the new online applications’ deployment has increased traffic analysis complexity. Traditionally, network administrators rely on recognizing well-known static ports for classifying the traffic flowing their networks. However, modern network traffic uses dynamic ports and is transported over secure application-layer protocols (e.g., HTTPS, SSL, and SSH). This makes it a challenging task for network administrators to identify online applications using traditional port-based approaches. One way for classifying the modern network traffic is to use machine learning (ML) to distinguish between the different traffic attributes such as packet count and size, packet inter-arrival time, packet send–receive ratio, etc. This paper presents the design and implementation of NetScrapper, a flow-based network traffic classifier for online applications. NetScrapper uses three ML models, namely K-Nearest Neighbors (KNN), Random Forest (RF), and Artificial Neural Network (ANN), for classifying the most popular 53 online applications, including Amazon, Youtube, Google, Twitter, and many others. We collected a network traffic dataset containing 3,577,296 packet flows with different 87 features for training, validating, and testing the ML models. A web-based user-friendly interface is developed to enable users to either upload a snapshot of their network traffic to NetScrapper or sniff the network traffic directly from the network interface card in real time. Additionally, we created a middleware pipeline for interfacing the three models with the Flask GUI. Finally, we evaluated NetScrapper using various performance metrics such as classification accuracy and prediction time. Most notably, we found that our ANN model achieves an overall classification accuracy of 99.86% in recognizing the online applications in our dataset.

Highlights

  • Published: 21 August 2021Network traffic analysis is the process of recognizing user applications, networking protocols, and communication patterns flowing through the network [1]

  • We found that our Artificial Neural Network (ANN) model achieves an overall classification accuracy of 99.86% in recognizing the online applications in our dataset

  • It is expected that NetScrapper would make a better opportunity for network administrators to monitor their network performance and detect any suspicious traffic that could harm the network components and legitimate users

Read more

Summary

Introduction

Network traffic analysis is the process of recognizing user applications, networking protocols, and communication patterns flowing through the network [1]. Most online user applications use dynamic ports, virtual private networks, and encrypted tunnels [5] These applications are transported over HTTPS connections and have applied security protocols (e.g., SSH and SSL) for ensuring QoS provisioning, security, and privacy. Imagine a user-friendly network traffic flow classifier that network administrators can use to identify the different types of online applications flowing their networks with high accuracy. Such systems would help them to perform administrative decisions, and detect malicious traffic and secure users’ data. This paper presents NetScrapper, a lightweight ML-powered traffic flow classifier for online applications, which can be deployed on-site at the network edge.

Related Work
Architecture
Dataset
ANN Structure
RF Structure
KNN Structure
ML Models
User Interface
Experimental Evaluation
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call