Efficient Vulnerability Detection based on abstract syntax tree and Deep Learning

Hantao Feng,Yuqing Zhang,He Wang,Hongyu Sun,Xiaotong Fu

doi:10.1109/infocomwkshps50562.2020.9163061

Abstract

The automatic vulnerability detection on program source code is an important research topic. With the development of artificial intelligence, deep learning has been applied to vulnerability detection. Existing methods do not make full use of the syntax structure of the program source code that only treats the code as plain text, which brings much redundancy. Moreover, to avoid computation overhead caused by redundancy, existing methods often use the truncate method to process variable-length data, which also cause data loss. In this paper, we propose a data processing method based on the abstract syntax tree to extract all syntax features and reduce data redundancy. Besides, we apply the pack-padded method on the Bi-GRU network to train variable-length data without truncation and padding. Compared with the current methods, our framework does not rely on the experts or predefined rules so that it is suitable to process a large number of source code. To evaluate the ability of our framework, we collect the vulnerability dataset that includes more than 260,000 functions in 118 types of CWE, which is larger than the dataset of existing research. Experiments show that our framework has better performance than existing methods.

Full Text