AbstractThe rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning‐based models. In this paper, we consider codes as various graphs with node features and propose a tensor‐based gated graph neural network called TensorGNN to produce code embedding for function‐level vulnerability detection. First, we propose a high‐dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open‐source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state‐of‐the‐art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.
Read full abstract