Aspect-level Information Discrepancies across Heterogeneous Vulnerability Reports: Severity, Types and Detection Methods

Jiamou Sun,Qinghua Lu,Xin Xia,Xiwei Xu,Zhenchang Xing,Liming Zhu

doi:10.1145/3624734

Abstract

Vulnerable third-party libraries pose significant threats to software applications that reuse these libraries. At an industry scale of reuse, manual analysis of third-party library vulnerabilities can be easily overwhelmed by the sheer number of vulnerabilities continually collected from diverse sources for thousands of reused libraries. Our study of four large-scale, actively maintained vulnerability databases (NVD, IBM X-Force, ExploitDB, and Openwall) reveals the wide presence of information discrepancies, in terms of seven vulnerability aspects, i.e., product, version, component, vulnerability type, root cause, attack vector, and impact, between the reports for the same vulnerability from heterogeneous sources. It would be beneficial to integrate and cross-validate multi-source vulnerability information, but it demands automatic aspect extraction and aspect discrepancy detection. In this work, we experimented with a wide range of NLP methods to extract named entities (e.g., product) and free-form phrases (e.g., root cause) from textual vulnerability reports and to detect semantically different aspect mentions between the reports. Our experiments confirm the feasibility of applying NLP methods to automate aspect-level vulnerability analysis and identify the need for domain customization of general NLP methods. Based on our findings, we propose a discrepancy-aware, aspect-level vulnerability knowledge graph and a KG-based web portal that integrates diversified vulnerability key aspect information from heterogeneous vulnerability databases. Our conducted user study proves the usefulness of our web portal. Our study opens the door to new types of vulnerability integration and management, such as vulnerability portraits of a product and explainable prediction of silent vulnerabilities.

Full Text