Abstract

The number of studies on applying machine learning to cyber security has increased over the past few years. These studies, however, are facing difficulties with making themselves usable in the real world, mainly due to the lack of training data and reusability of a created model. While transfer learning seems like a solution to these problems, the number of studies in the field of intrusion detection is still insufficient. Therefore, this study proposes payload feature-based transfer learning as a solution to the lack of training data when applying machine learning to intrusion detection by using the knowledge from an already known domain. Firstly, it expands the extracting range of information from header to payload to accurately deliver the information by using an effective hybrid feature extraction method. Secondly, this study provides an improved optimization method for the extracted features to create a labeled dataset for a target domain. This proposal was validated on publicly available datasets, using three distinctive scenarios, and the results confirmed its usability in practice by increasing the accuracy of the training data created from the transfer learning by 30%, compared to that of the non-transfer learning method. In addition, we showed that this approach can help in identifying previously unknown attacks and reusing models from different domains.

Highlights

  • With the advance of information technology, cyberattacks are becoming more intelligent and mass-produced, overwhelming the detection, analysis, and response abilities of traditional security approaches [1,2]

  • We showed that it is possible to use the training data generated by the proposed approach to identify the new types of attack detection that do not exist in the training data, and demonstrated that the model could be reused in other domains (Section 5)

  • We propose payload feature-based transfer learning (PF-TL) as described in Figure 2 in which the intrusion detection knowledge is transferred from a source domain to a target domain in order to train an unlabeled dataset

Read more

Summary

Introduction

With the advance of information technology, cyberattacks are becoming more intelligent and mass-produced, overwhelming the detection, analysis, and response abilities of traditional security approaches [1,2]. The number of studies applying artificial intelligence technology to the cybersecurity field is increasing [3,4,5]. Among these studies, intrusion detection is one of the particular fields where machine learning is showing higher detection rates and fewer false positive cases than the conventional signature-based detection methods [5,6,7,8]. In order to understand how machine learning works better than signature-based methods, one must first thoroughly understand the structure of the data of a common intrusion detection event. The header contains network information and the flow of the source

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call