Abstract

Conventional network traffic detection methods based on data mining could not efficiently handle high throughput traffic with concept drift. Data stream mining techniques are able to classify evolving data streams although most techniques require completely labeled data. This paper proposes an improved data stream mining algorithm for online network traffic classification that is able to incrementally learn from both labeled and unlabeled flows. The algorithm uses the concept of incremental k-means and self-training semi-supervised method to continuously update the classification model after receiving new flow instances. The experimental results show that the proposed algorithm is able to classify 325 thousands flow instances per second and achieves up to 91–94 % average accuracy, even when using 10 % of labeled input flows. It is also able to maintain high accuracy even in the presence of concept drifts. Although there are drifts detected in the datasets evaluated using the Drift Detection Method, our proposed method with incremental learning is able to achieve up to 91–94 % accuracy compared to 60–69 % without incremental learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call