CWE Prediction Using CVE Description - The Semantic Similarity Approach

Kethan Kota,Manjunatha A,Sree Vivek S

doi:10.1016/j.procs.2024.04.111

Abstract

With the growing number of cyber attacks, vulnerability management has gained prominence. Analyzing the vulnerabilities and their underlying weaknesses stored in the vulnerability database is a vital part of vulnerability management. View-1003 is used to label CWE values for the CVE entries in NVD from 2016 onwards. This view has 130 CWE values arranged in a hierarchical tree structure of two layers. Yet a vast number of CVEs are still unclassifed as the process of assigning CWE to known vulnerability is mostly manual. To automate the CWE prediction for CVE, a novel approach is proposed that captures the hierarchical relationship of CWEs in the MITRE tree and the semantic text similarity between CVE description and CWE information. This is achieved by using Cross-encoder models trained on each layer in the MITRE tree that capture the semantic similarity of string pairs created using CVE descriptions and CWE information. Accuracy of the two cross-encoder models on test data is 81.2% and 94.4% for the model trained on top layer and bottom layer of the MITRE tree respectively. A binary classifier model is used to connect the two cross-encoder models as part of the proposed approach. Accuracy of the binary classifier model is 90.9 % on the test data set. The proposed approach achieved an overall test data accuracy of 72.1% and a macro-averaged F1score of 0.735 on 13,896 CVE records. This work considers more CWEs on greater data size and provides a highly practical solution when compared to the existing approaches.

Full Text