Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach

Tianyi Wang,Shengzhi Qin,Kam Pui Chow

doi:10.1109/cse53436.2021.00030

Tianyi Wang, Shengzhi Qin + Show 1 more

https://doi.org/10.1109/cse53436.2021.00030

Copy DOI

Export

Save

Cite

Publication Date: Oct 1, 2021

Citations: 8

Affiliation: University of Hong Kong

Abstract
Full-Text
Similar Papers

Abstract

Listen

The wake of increasing malicious cyberattack cases has aroused people’s attention on cybersecurity and vulnerabilities. Common Vulnerabilities and Exposures (CVE), a famous cybersecurity vulnerability database, is often referenced as a standard in cybersecurity territory for both research and commercial purposes. In the past decade, the development of Common Weakness Enumeration (CWE) has provided useful vulnerability taxonomy on CVE entities. However, the generation process of CWE categories is totally by manual working, which has made cybersecurity professionals suffer from the unpredictable timing waiting for the up to date information to be published. In this study, a new CWE based vulnerability types classification method is introduced with the adoption of the CVE dataset. Our method adopts transformer encoder-decoder architecture and uses pure self-attention mechanism without any convolutions and recurrences. We first encode the CVE input entries to learn representative features and then decode them to perform vulnerability types classification regarding the CWE standards. Fine-tuned deep pre-trained Bidirectional Encoder Representation from Transformers (BERT) is utilized in experiment and performs automatic vulnerability types classification tasks on unlabeled CVE candidates and assigns CWE IDs. The proposed vulnerability types classification method outperforms all classical Natural Language Processing (NLP) baseline algorithms, conducting a high accuracy of 90.74% on the testing dataset. In addition, the well-trained vulnerability types classification model is believed to achieve considerable correctness at industry level when applied to the real-life cyber threat intelligence related articles and reports.

Full Text