Abstract

As cyber attacks are growing, Cyber Threat Intelligence (CTI) enhances the ability of security systems to resist novel cyber threats. However, since most CTI is unstructured data written in natural language, it needs to be understood and summarized by security experts to be effectively utilized. To address the problem, we adopt the ATT&CK matrix as the taxonomy to propose a method for automated mapping of unstructured threat intelligence to tactics and techniques. The proposed method contains a pre-processor for text denoising, a label extractor for classifying which tactics and techniques category the text belongs to, and a post-processor for correcting the classification results. The label extractor consists of two multi-label classifiers based on DistilBERT for tactics and techniques classification respectively. The post-processor corrects the classification results based on the relations between tactics, techniques, and sub-techniques in the matrix, eliminating errors caused by the independence between categories. In the evaluation, we collect the text data from the ATT&CK knowledge base and real cyber threat reports to build an experiment dataset, which contains 26,602 sentence samples. We apply the proposed method to the dataset to verify its effectiveness. The results show that the proposed method can accurately retrieve tactics and techniques with F0.5 score of 85.50% and 75.17% respectively, which outperforms the baseline method by about 10%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call