Abstract

With the current massive amount of traffic that is going through the internet, internet service providers (ISPs) and networking service providers (NSPs) are looking for various ways to accurately predict the application type of flow that is going through the internet. Such prediction is critical for security and network monitoring applications as they require application type to be known in prior. Traditional ways using port-based or payload-based analysis are not sufficient anymore as many applications start using dynamic unknown port numbers, masquerading, and encryption techniques to avoid being detected. Recently, machine learning has gained significant attention in many prediction applications including traffic classification from flow features or characteristics. However, such algorithms suffer from an imbalanced data problem where some applications have fewer flow data and hence difficult to predict. In this paper, we employ network flow-level characteristics to identify the application type of traffic. Furthermore, we propose the use of an improved support vector machine (SVM) algorithm, named cost-sensitive SVM (CMSVM), to solve the imbalance problem in network traffic identification. CMSVM adopts a multi-class SVM algorithm with active learning which dynamically assigns a weight for applications. We examine the classification accuracy and performance of the CMSVM algorithm using two different datasets, namely MOORE_SET and NOC_SET datasets. Our results show that the CMSVM algorithm can reduce computation cost, improve classification accuracy and solve the imbalance problem when compared to other machine learning techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call