Abstract

As the complexity and scale of the network environment increase continuously, various methods to detect attacks and intrusions from network traffic by classifying normal and abnormal network behaviors show their limitations. The number of network traffic signatures is increasing exponentially to the extent that semi-realtime detection is not possible. However, machine learning-based intrusion detection only gives simple guidelines as simple contents of security events. This is why security data for a specific environment cannot be configured due to data noise, diversification, and continuous alteration of a system and network environments. Although machine learning is performed and evaluated using a generalized data set, its performance is expected to be similar in that specific network environment only. In this study, we propose a high-speed outlier detection method for a network dataset to customize the dataset in real-time for a continuously changing network environment. The proposed method uses an ensemble-based noise data filtering model using the voting results of 6 classifiers (decision tree, random forest, support vector machine, naive Bayes, k-nearest neighbors, and logistic regression) to reflect the distribution and various environmental characteristics of datasets. Moreover, to prove the performance of the proposed method, we experimented with the accuracy of attack detection by gradually reducing the noise data in the time series dataset. As a result of the experiment, the proposed method maintains a training dataset of a size capable of semi-real-time learning, which is 10% of the total training dataset, and at the same time, shows the same level of accuracy as a detection model using a large training dataset. The improved research results would be the basis for automatic tuning of network datasets and machine learning that can be applied to special-purpose environments and devices such as ICS environments.

Highlights

  • With information and communication technology development, various services and computing environments are interconnected to create higher value

  • 20,000 pieces of data, which is about 10% of the entire data, were used as the initial training dataset, and the remaining 205,711 data were used as new data by dividing the data by a specific time unit

  • Experiments were performed to compare the performance of the proposed method with the ideal machine learning detection method

Read more

Summary

Introduction

With information and communication technology development, various services and computing environments are interconnected to create higher value. Has reached a very high complexity and scale. Network data has already reached a level where the amount and bandwidth cannot be processed in real-time, leading to new physical and technical challenges that existing security systems and services must solve [1,2]. Attacks on networks gradually diversify the patterns and forms of attacks by actively using the complex network characteristics [3]. As the types of communication protocols and services constituting the network are diversified, it is becoming easier to modify existing attack techniques and apply them to a new environment that could be classified as unknown attacks [4].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call