XVDB: A High-Coverage Approach for Constructing a Vulnerability Database

Jihyun Choi,Hyunji Hong,Eunjin Choi,Seunghoon Woo,Heejo Lee

doi:10.1109/access.2022.3197786

Abstract

Security patches play an important role in detecting and fixing one-day vulnerabilities. However, collecting abundant security patches from diverse data sources is not a simple task. This is because (1) each data source provides vulnerability information in a different way and (2) many security patches cannot be directly collected from Common Vulnerabilities and Exposures (CVE) information (e.g., National Vulnerability Database (NVD) references). In this paper, we propose a high-coverage approach that collects known security patches by tracking multiple data sources. Specifically, we considered the following three data sources: repositories (e.g., GitHub), issue trackers (e.g., Bugzilla), and Q&A sites (e.g., Stack Overflow). From the data sources, we gather even security patches that cannot be collected by considering only CVE information (i.e., previously untracked security patches). In our experiments, we collected 12,432 CVE patches from repositories and issue trackers, and 12,458 insecure posts from Q&A sites. We could collect at least four times more CVE patches than those collected in existing approaches, which demonstrates the efficacy of our approach. The collected security patches serves as a database on a public website (i.e., IoTcube) to proceed with the detection of vulnerable code clones.

Full Text