On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Taimur Bakhshi,Bogdan Ghita

doi:10.1155/2016/2048302

Taimur Bakhshi, Bogdan Ghita

Open Access

https://doi.org/10.1155/2016/2048302

Copy DOI

Abstract

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application throughk-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected tok-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

Highlights

Traffic classification methods using flow and packet based measurements have been previously researched using various techniques ranging from automated machine learning (ML) algorithms to deep packet inspection (DPI) for accurate application identification
The present paper proposed a per-flow C5.0 decision tree classifier by employing a two-phased machine learning approach while solely utilizing the existing quantitative attributes of NetFlow records
Payload based classifiers inspect packet payloads using deep packet inspection (DPI) to identify application signatures or utilize a stochastic inspection (SPI) of packets to look for statistical parameters in packet payloads

Summary

Introduction

Traffic classification methods using flow and packet based measurements have been previously researched using various techniques ranging from automated machine learning (ML) algorithms to deep packet inspection (DPI) for accurate application identification. With increasing ubiquity of flow level network monitoring which presents a low-cost traffic accounting solution, utilizing NetFlow due to scalability and ease of use, statistical classification techniques utilizing flow measurements have gained momentum [2, 8,9,10,11,12, 23]. The present work picks up from this narrative and solely utilizes NetFlow attributes using two-phased machine learning (ML), incorporating a combination of unsupervised kmeans cluster analysis and C5.0 based decision tree algorithm to achieve high accuracy in application traffic classification

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer Networks and Communications	Publication Date: Jan 1, 2016
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Networks and Communications

Lead the way for us

Similar Papers

Optimal supervised feature extraction in internet traffic classification
M Sadegh Aliakbarian ... Fatemeh Sadat Saleh
-
M Sadegh Aliakbarian, et. al.M Sadegh Aliakbarian ... Fatemeh Sadat Saleh
01 Aug 2013
01 Aug 2013

Network Traffic Classification and Control Technology Based on Decision Tree
Nanfang Li ... Zongrong Li
-
Nanfang Li, et. al.Nanfang Li ... Zongrong Li
31 Jul 2019
31 Jul 2019

A Complete Review on the Application of Statistical Methods for Evaluating Internet Traffic Usage
Vanice Canuto Cunha ... Pedro R M Inacio
IEEE Access | VOL. 10
Vanice Canuto Cunha, et. al.Vanice Canuto Cunha ... Pedro R M Inacio
01 Jan 2021
IEEE Access | VOL. 10

Machine Learning Techniques for Traffic Identification and Classifiacation in SDWSN: A Survey
Ratanang Thupae ... Bassey Isong
-
Ratanang Thupae, et. al.Ratanang Thupae ... Bassey Isong
01 Oct 2018
01 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Networks and Communications