Abstract

Code clone detection plays a critical role in the field of software engineering. To achieve this goal, developers are required to have rich development experience for finding the &#x201C;functional&#x201D; clone code. However, this is unfriendly to novice developers. Although many approaches were proposed to automatically detect code clones, the results are not satisfactory. A major reason is that it is difficult to extract syntax and semantic information from the source code. To resolve this problem, in this article, we develop a novel graph representation approach based on intermediate code to detect the functional code clones. This graph representation is built based on intermediate code compiled from the source code. By using it, we can easily utilize graph embedding techniques to extract syntactic and semantic features from abstract syntax tree, control flow graph, and DFG generated from intermediate code. After that, we use the Softmax classifier to detect functional code clone pairs. We evaluate the performance of the proposed graph representation approach based on intermediate code for the code clone detection task on the BigCloneBench dataset. In order to improve performance, the embedded representation of intermediate code is initialized based on pretrained vectors learned from the collected LLVM IR dataset in advance. The experimental results show that our proposed intermediate code-based graph approach performs better than existing functional code clone detection approaches. Especially for the type-4 code clone detection, our approach outperforms the baseline approaches by an average of 33.49&#x0025; in the term of <i>F</i>1 score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call