Abstract
Cyber-threat attribution is the identification of attacker responsible for a cyber-attack. It is a challenging task as attacker uses different obfuscation and deception techniques to hide its identity. After an attack has occurred, digital forensic investigation is conducted to collect evidence from network/system logs. After investigation and collecting evidence reports are published in multiple formats such as text and PDF. There is no standard format for publishing these reports, so extracting meaningful information from these reports is a challenging task. Manual extraction of features from unstructured cyber-threat intelligence (CTI) is a difficult task. There is a need for an automated mechanism to extract features from unstructured reports and attribute cyber-threat actor (CTA). The aim of this research is to develop a mechanism to attribute or profile cyber threat actors (CTA) by extracting features from CTI reports. Moreover define a methodology to extract features from unstructured CTI reports by using natural language processing (NLP) techniques and then attributing cyber threat actor by using machine learning algorithms. Extracting features i.e., tactics, techniques, tools, malware, target organization/country and application by using novel embedding model known as” Attack2vec” which is trained on domain specific embeddings. Training model on domain specific embedding produces high results as compared to model train on general embeddings specially in the field of cyber security. Results of this novel model is compared with different methods. Machine learning algorithms such as decision tree, random forest, support vector machine is used for classification of CTA. This novel model produces high results as compared to other models with Accuracy of 96%, Precision of 96.4%, Recall of 95.58% and F1-measure of 95.75%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have