Abstract

Real-time classification of internet traffic is critical for the efficient management of networks. Classification approaches based on machine learning techniques have shown promising results with high levels of accuracy. In this article, the suitability of packet-level and flow-level features is validated using stepwise regression and random forest feature selection. Moreover, the optimal percentage of packets considered within a flow while extracting flow-level features is determined. Several experiments are conducted using naive Bayes, support vector machine, $k$ -nearest neighbor, random forest, and artificial neural networks on the University of Brescia (UNIBS) and the University of New Brunswick (UNB) datasets, which are both publicly available. The performed experiments show that 60% of flow packets are a good compromise that ensures high performance in the least processing time. The results of the conducted experiments indicate that random forest outperforms other algorithms achieving a maximum accuracy of 98.5% and an F-score of 0.932. Further, and since software-based classifiers cannot meet the anticipated real-time requirements, we propose a Field-Programmable Gate Array (FPGA) based random forest implementation that utilizes a highly pipelined architecture to accelerate such a time-consuming task. The proposed design achieves an average throughput of 163.24 Gbps, exceeding throughputs of reported hardware-based classifiers that use comparable approaches, which in turn ensures the continuity of real-time traffic classification at congested data centers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call