A code clone detection algorithm based on graph convolution network with AST tree edge

Zhicheng Lu,Huamiao Hu,Wen-An Zhou,Ruochen Li

doi:10.1109/qrs-c55045.2021.00156

Abstract

Detecting code cloning will prevent it from bringing risks such as vulnerabilities and intellectual property disputes in complex software systems such as large-scale defense software systems and commercial software systems. In the field of deep code clone detection, neural networks such as Tree-CNN and Tree-LSTM, which extract features from AST (abstract syntax tree), can't collect global information of upper and lower nodes, and information can't flow globally, but graph neural network can avoid this problem. This paper presents a method of edging AST, and uses GCN (Graph Convolutional Network) and GAT(Graph Attention Networks) to extract code feature vector. Finally, the experiment is carried out on BigCloneBench data set, using several common binary classification indexes, and analyzing the time consumption, it is concluded that the effect and time efficiency of using graph neural network for code clone detection are significantly improved, especially for the code fragments with completely different semantics.

Full Text