Abstract

As Internet traffic classification is a typical problem for ISPs or mobile carriers, there have been a lot of studies based on statistical packet header information, deep packet inspection, or machine learning. Due to recent advances in end-to-end encryption and dynamic port policies, machine or deep learning has been an essential key to improve the accuracy of packet classification. In addition, ISPs or mobile carriers should carefully deal with the privacy issue while collecting user packets for accounting or security. The recent development of distributed machine learning, called federated learning, collaboratively carries out machine learning jobs on the clients without uploading data to a central server. Although federated learning provides an on-device learning framework towards user privacy protection, its feasibility and performance of Internet traffic classification have not been fully examined. In this paper, we propose a federated-learning traffic classification protocol (FLIC), which can achieve an accuracy comparable to centralized deep learning for Internet application identification without privacy leakage. FLIC can classify new applications on-the-fly when a participant joins in learning with a new application, which has not been done in previous works. By implementing the prototype of FLIC clients and a server with TensorFlow, the clients gather packets, perform the on-device training job and exchange the training results with the FLIC server. In addition, we demonstrate that federated learning-based packet classification achieves an accuracy of 88% under non-independent and identically distributed (non-IID) traffic across clients. When a new application that can be classified dynamically as a client participates in learning was added, an accuracy of 92% was achieved.

Highlights

  • Internet traffic classification is a representative research topic that has been significantly studied

  • As federated learning relies on stochastic gradient descent (SGD) for optimization, the objective of federated-learning traffic classification protocol (FLIC) is given in Equation (1)

  • Even if the total amount of data distributed to all clients is the same, in the four-class non-independent and identically distributed (non-individual distribution (IID)) experiment distributed to more clients (Figure 11b), FLIC classifies applications with a small amount of data better

Read more

Summary

Introduction

Internet traffic classification is a representative research topic that has been significantly studied. While machine learning is useful for Internet traffic classification under packet encryption, ISPs or mobile carriers need to collect application packets and their information from users for the training process. The central server computes only the aggregated average of the training data gathered from each device. It protects privacy as the user data resides on the device. Federated learning is promising because of privacy concerns, it has not been fully understood for the feasibility of Internet traffic classification under the realistic data model with unbalanced, independent and individual distribution (IID) characteristics. By considering the above requirements, we propose a federated-learning Internet traffic classification framework (FLIC) that can label packets into applications dynamically. In the environment of non-IID traffic distribution and of dynamically increasing clients, FLIC achieved 88% and 92% accuracy, respectively

Related Work
Architecture
Protocol
Training Model and Feature Vector
Federated Optimization
Dynamic Classification
Performance Evaluation
Datasets
AIM SCP
Accuracy by the Number of FLIC Participants
Accuracy under Non-IID Traffic Distribution
Accuracy by Clients Local Epochs
Dynamic Traffic Classification with Federated Learning
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.