GRRLN: Gated Recurrent Residual Learning Networks for code clone detection

Xiangping Zhang,Jianxun Liu,Min Shi

doi:10.1002/smr.2649

Abstract

AbstractCode clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods.

Full Text