Online Incremental Learning for High Bandwidth Network Traffic Classification

H R Loo,S B Joseph,M N Marsono

doi:10.1155/2016/1465810

Abstract

Data stream mining techniques are able to classify evolving data streams such as network traffic in the presence of concept drift. In order to classify high bandwidth network traffic in real-time, data stream mining classifiers need to be implemented on reconfigurable high throughput platform, such as Field Programmable Gate Array (FPGA). This paper proposes an algorithm for online network traffic classification based on the concept of incrementalk-means clustering to continuously learn from both labeled and unlabeled flow instances. Two distance measures for incrementalk-means (Euclidean and Manhattan) distance are analyzed to measure their impact on the network traffic classification in the presence of concept drift. The experimental results on real datasets show that the proposed algorithm exhibits consistency, up to 94% average accuracy for both distance measures, even in the presence of concept drifts. The proposed incrementalk-means classification using Manhattan distance can classify network traffic 3 times faster than Euclidean distance at 671 thousands flow instances per second.

Highlights

Network traffic classification is a critical network processing task for network management
References [7,8,9] have proposed the use of data stream mining algorithms for traffic classification such as Very Fast Decision Tree [3] and Concept-Adaptive Very Fast Decision Tree [4]
We proposed online incremental k-means clustering in [16] for online network traffic classification

Summary

Introduction

Network traffic classification is a critical network processing task for network management. The complexity and dynamic characteristic of today’s network traffic have necessitated the need for traffic classification techniques that are able to adapt to new concepts This includes the ability to classify types of traffic almost instantaneously to avoid outdating the knowledge gained from the learning of new concepts. Reference [10] proposed a new algorithm named Concept-Adaptive Rough Set based Decision Tree (CRSDT) to classify network traffic. These algorithms have successfully demonstrated the ability of data stream mining to handle dynamic and fast changing network data streams with sustained accuracy. The decision tree based implementation requires intensive training process and causes high memory consumption for model building [11]

Methods

Results

Conclusion