Analysis of Network Traffic Measurements using Machine Learning

Hae-Duck Joshua Jeong

doi:10.35444/ijana.2023.14604

Abstract

In recent times, an exponential increase in internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning.In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models

Full Text