Abstract

Network traffic classification is the operation of giving appropriate identification to the every traffic flowing through a network. Several methods have been applied in the past to achieve network traffic classification including port-based, payload-based, behavior based and so on. These methods have been found to one limitation or the other. Nowadays, attention is now on Machine Learning(ML) methods that rely on the statistical properties of the traffic flows generated. However, ML methods do not perform well when confronted with large-scale traffic data having large number of features and instances. Feature selection is employed to remove non-relevant and redundant features before passing the data to ML classifiers. In this study, network traffic classification using ML methods is demonstrated from two perspectives: one that involves feature selection and one that does not. A number of performance metrics are considered including runtime, accuracy, recall, precision and F- score. The experimental results indicate that the classification without features has an average accuracy and runtime of 94.14% and 0.52 seconds respectively. On the other hand, the method with feature selection has accuracy of 95.61% and average of 0.25 seconds for the runtime. The improvement obtained reflects the importance of applying only relevant and non-redundant features to the ML methods. Thus it recommended that feature selection be included in the network classification process to guarantee an optimal accuracy result.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call