CVE2ATT&amp;CK: BERT-Based Mapping of CVEs to MITRE ATT&amp;CK Techniques

Octavian Grigorescu,Razvan Rughinis,Andreea Nica,Mihai Dascalu

doi:10.3390/a15090314

CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE ATT&CK Techniques

Octavian Grigorescu, Razvan Rughinis + Show 2 more

Open Access

https://doi.org/10.3390/a15090314

Copy DOI

Journal: Algorithms	Publication Date: Aug 31, 2022
Citations: 15	License type: CC BY 4.0

Affiliation: Polytechnic University of Bucharest

Abstract

Since cyber-attacks are ever-increasing in number, intensity, and variety, a strong need for a global, standardized cyber-security knowledge database has emerged as a means to prevent and fight cybercrime. Attempts already exist in this regard. The Common Vulnerabilities and Exposures (CVE) list documents numerous reported software and hardware vulnerabilities, thus building a community-based dictionary of existing threats. The MITRE ATT&CK Framework describes adversary behavior and offers mitigation strategies for each reported attack pattern. While extremely powerful on their own, the tremendous extra benefit gained when linking these tools cannot be overlooked. This paper introduces a dataset of 1813 CVEs annotated with all corresponding MITRE ATT&CK techniques and proposes models to automatically link a CVE to one or more techniques based on the text description from the CVE metadata. We establish a strong baseline that considers classical machine learning models and state-of-the-art pre-trained BERT-based language models while counteracting the highly imbalanced training set with data augmentation strategies based on the TextAttack framework. We obtain promising results, as the best model achieved an F1-score of 47.84%. In addition, we perform a qualitative analysis that uses Lime explanations to point out limitations and potential inconsistencies in CVE descriptions. Our model plays a critical role in finding kill chain scenarios inside complex infrastructures and enables the prioritization of CVE patching by the threat level. We publicly release our code together with the dataset of annotated CVEs.

Full Text