Comparative Experiment on TTP Classification with Class Imbalance Using Oversampling from CTI Dataset

Heejung Kim,Hwankuk Kim,Zhe-Li Liu

doi:10.1155/2022/5021125

Heejung Kim, Hwankuk Kim + Show 1 more

Open Access

https://doi.org/10.1155/2022/5021125

Copy DOI

Journal: Security and Communication Networks	Publication Date: Oct 12, 2022
Citations: 2	License type: CC BY 4.0

Affiliation: Sangmyung University

Abstract

Cyber threat intelligence (CTI) refers to the real-time collection of threat information and analysis of these acquired data to identify the situation and attack mechanism of a security threat. In a CTI analysis, it is important to have a standardized attack model. Recently, the MITRE adversarial tactics, techniques, and common knowledge (ATT&CK) framework has been widely used as the de facto standard security threat modeling technique. However, analyzing a large amount of data using the tactics, techniques, and procedures (TTP) of ATT&CK with a limited number of security personnel is time-consuming. To solve this cost-sensitive issue, research on automated classification of TTP from CTI data using artificial intelligence techniques is currently underway but remains challenging. This is because CTI data are domain-specific, and therefore, it is difficult to obtain labeling data to be used as training data for AI models. Hence, the distribution of training data related to TTP labeling is imbalanced. Thus, the current accuracy of ML-based TTP classification is still around 6080%. This study aims to improve the TTP classification accuracy from unstructured CTI data using machine learning while mainly focusing on solving the problems of small training datasets and TTP class imbalance. Therefore, we proposed a TTP classification method by applying easy data argumentation (EDA) and compared its performance with those of previous studies. By applying the proposed methodology, a 60–80% improvement was observed compared to the reference baseline model, TRAM. This indicates that the preprocessing methodology of applying the EDA technique is effective at improving the performance of TTP classification from unstructured CTI data in the CTI domain.

Full Text