Abstract

As the number of vulnerability databases established by various nations continues to grow, they have accumulated hundreds of thousands of security vulnerability reports, which play a crucial role in protecting system security. However, many databases are found to lack essential information, contain inaccuracies, or are inconsistent with others. Despite these challenges, the importance of vulnerability databases continues to grow. Current research on vulnerability databases is limited to software version and vulnerability reproduction, but the software names, an essential component of vulnerability databases, have not been extensively studied. Understanding the consistency of software names in different vulnerability databases is crucial for improving the accuracy of vulnerability databases.The paper introduces VERNIER, an automated method for measuring inconsistencies in 789,954 sets of software names from nine security vulnerability databases (including CVE and NVD) from 1999 to 2019. We utilized a named entity recognition (NER) model with exceptional accuracy (99.5%) and F1 score (95.1%) to extract software names from unstructured Chinese and English vulnerability reports. VERNIER assesses software names' inconsistency at character and semantic levels. The results indicate that inconsistent software names are prevalent in vulnerability databases. The average of the exact matching rate between NVD and other mainstream databases, such as CVE, is only 20.3% at the character-level and 43.3% at the semantic-level. We also discover internal inconsistencies between the structured and unstructured software names inside the same vulnerability database (e.g., NVD). To mitigate the inconsistency, we implement an alert tool using inconsistencies to detect incorrect software names. This tool can effectively warn and correct software names.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call