Comparative analysis of approaches to source code vulnerability detection based on deep learning methods

Gennadiy Kyselov,Yevhenii Kubiuk

doi:10.15587/2706-5448.2021.233534

Abstract

The object of research of this work is the methods of deep learning for source code vulnerability detection. One of the most problematic areas is the use of only one approach in the code analysis process: the approach based on the AST (abstract syntax tree) or the approach based on the program dependence graph (PDG). In this paper, a comparative analysis of two approaches for source code vulnerability detection was conducted: approaches based on AST and approaches based on the PDG. In this paper, various topologies of neural networks were analyzed. They are used in approaches based on the AST and PDG. As the result of the comparison, the advantages and disadvantages of each approach were determined, and the results were summarized in the corresponding comparison tables. As a result of the analysis, it was determined that the use of BLSTM (Bidirectional Long Short Term Memory) and BGRU (Bidirectional Gated Linear Unit) gives the best result in terms of problems of source code vulnerability detection. As the analysis showed, the most effective approach for source code vulnerability detection systems is a method that uses an intermediate representation of the code, which allows getting a language-independent tool. Also, in this work, our own algorithm for the source code analysis system is proposed, which is able to perform the following operations: predict the source code vulnerability, classify the source code vulnerability, and generate a corresponding patch for the found vulnerability. A detailed analysis of the proposed system’s unresolved issues is provided, which is planned to investigate in future researches. The proposed system could help speed up the software development process as well as reduce the number of software code vulnerabilities. Software developers, as well as specialists in the field of cybersecurity, can be stakeholders of the proposed system.

Highlights

Nowadays, information technologies are used in almost all spheres of human activity
One of the most problematic areas is the use of only one approach in the code analysis process: the approach based on the AST or the approach based on the program dependence graph (PDG)
The most effective approach for source code vulnerability detection systems is a method that uses an intermediate representation of the code, which allows getting a language-independent tool

Summary

Introduction

Information technologies are used in almost all spheres of human activity. As a result, the need for high-quality software is constantly growing. With the development of machine learning the algorithms were created, which use statistical and machine learning models to predict vulnerabilities in the code [3, 4] The weakness of this approach is that it requires a technical expert, who would have to set up the system manually, for example: to create a dictionary of the language syntax, add information about grammatical structures, etc. These disadvantages are absent in models using the deep learning approach. The aim of research is to conduct a comparative analysis of existing deep learning in the tasks of source code vulnerability detection

Methods of research

Research results and discussion

Conclusions