Abstract

Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call