Abstract

Conventional network traffic detection methods based on data mining could not efficiently handle high throughput traffic with concept drift. Data stream mining techniques are able to classify evolving data streams although most techniques require completely labeled data. This paper proposes an improved data stream mining algorithm for online network traffic classification that is able to incrementally learn from both labeled and unlabeled flows. The algorithm uses the concept of incremental k-means and self-training semi-supervised method to continuously update the classification model after receiving new flow instances. The experimental results show that the proposed algorithm is able to classify 325 thousands flow instances per second and achieves up to 91–94 % average accuracy, even when using 10 % of labeled input flows. It is also able to maintain high accuracy even in the presence of concept drifts. Although there are drifts detected in the datasets evaluated using the Drift Detection Method, our proposed method with incremental learning is able to achieve up to 91–94 % accuracy compared to 60–69 % without incremental learning.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.