Code Property Graph-Based Vulnerability Dataset Generation for Source Code Detection

Zhibin Guan,Jiajie Wang,Wei Xin,Xiaomeng Wang

doi:10.1007/978-981-15-9739-8_43

Abstract

AbstractMost existing deep learning-based source code vulnerability detection methods focus on the design of different deep learning algorithms to improve the accuracy of source code vulnerability detection, ignoring the obvious problems: firstly, lack of sufficient source code vulnerability data; secondly, lack of high-quality code data for deep learning algorithms. Therefore, we propose a code attribute graph-based data generation method for deep learning based source code vulnerability detection. The proposed method tries to represent source code based on code attribute graph, which is used to extract the control flow information and data flow information of source code sequentially. And the code data can be generated by retrieval-matching method. The advantage of this method is that it can extract rich semantic information of source code, and the generated code slices can be used for deep learning algorithms directly. Experimental results show that the proposed method can generate a large number of high-quality code data, which can provide data for deep learning-based source code vulnerability detection.KeywordsVulnerability dataset generationCode property graph/Code attribute graphData flowControl flowSource code vulnerability detectionDeep learning

Full Text