Abstract

Feature selection is often used as a pre-processing step for machine learning based network traffic classification. Many feature selection techniques have been developed to find an optimal subset of relevant features and to improve overall classification accuracy. But such techniques ignore the class imbalance problem encountered in network traffic classification. The selected feature subset may bias towards the traffic class that occupies the majority of traffic flows on the Internet. To address this issue, this paper proposes a new approach, called class-oriented feature selection (COFS), to identify a relevant feature subset for every class. It combines the proposed local metric and the existing global metric to yield a potentially optimal feature subset for each class, and then removes the redundant features in each feature subset based on the weighted symmetric uncertainty. Additionally, to enhance the generalization on network traffic data, an ensemble learning based scheme is presented with COFS to overcome the negative impacts of the data drift on a traffic classifier. Experiments on real-world network traffic data show that COFS outperforms existing feature selection techniques in most cases. Moreover, our approach achieves >96% flow accuracy and >93% byte accuracy on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call