CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection

Wei Tang,Mingwei Tang,Minchao Ban,Ziguo Zhao,Mingjun Feng

doi:10.1016/j.jss.2023.111623

Wei Tang, Mingwei Tang + Show 3 more

https://doi.org/10.1016/j.jss.2023.111623

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In order to secure software, it is critical to detect potential vulnerabilities. The performance of traditional static vulnerability detection methods is limited by predefined rules, which rely heavily on the expertise of developers. Existing deep learning-based vulnerability detection models usually use only a single sequence or graph embedding approach to extract vulnerability features. Sequence embedding-based models ignore the structured information inherent in the code, and graph embedding-based models lack effective node and graph embedding methods. As a result, we propose a novel deep learning-based approach, CSGVD (Combining Sequence and Graph embedding for Vulnerability Detection), which considers function-level vulnerability detection as a graph binary classification task. Firstly, we propose a PE-BL module, which inherits and enhances the knowledge from the pre-trained language model. It extracts the code’s local semantic features as node embedding in the control flow graph by using sequence embedding. Secondly, CSGVD uses graph neural networks to extract the structured information of the graph. Finally, we propose a mean biaffine attention pooling, M-BFA, to better aggregate node information as a graph’s feature representation. The experimental results show that CSGVD outperforms the existing state-of-the-art models and obtains 64.46% accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.

Full Text