Abstract

The automatic vulnerability detection on program source code is an important research topic. With the development of artificial intelligence, deep learning has been applied to vulnerability detection. Existing methods do not make full use of the syntax structure of the program source code that only treats the code as plain text, which brings much redundancy. Moreover, to avoid computation overhead caused by redundancy, existing methods often use the truncate method to process variable-length data, which also cause data loss. In this paper, we propose a data processing method based on the abstract syntax tree to extract all syntax features and reduce data redundancy. Besides, we apply the pack-padded method on the Bi-GRU network to train variable-length data without truncation and padding. Compared with the current methods, our framework does not rely on the experts or predefined rules so that it is suitable to process a large number of source code. To evaluate the ability of our framework, we collect the vulnerability dataset that includes more than 260,000 functions in 118 types of CWE, which is larger than the dataset of existing research. Experiments show that our framework has better performance than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call