Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph

Dawei Yuan,Zhou Xu,Sen Fang,Xiapu Luo,Tao Zhang

doi:10.1109/tr.2022.3176922

Abstract

Code clone detection plays a critical role in the field of software engineering. To achieve this goal, developers are required to have rich development experience for finding the “functional” clone code. However, this is unfriendly to novice developers. Although many approaches were proposed to automatically detect code clones, the results are not satisfactory. A major reason is that it is difficult to extract syntax and semantic information from the source code. To resolve this problem, in this article, we develop a novel graph representation approach based on intermediate code to detect the functional code clones. This graph representation is built based on intermediate code compiled from the source code. By using it, we can easily utilize graph embedding techniques to extract syntactic and semantic features from abstract syntax tree, control flow graph, and DFG generated from intermediate code. After that, we use the Softmax classifier to detect functional code clone pairs. We evaluate the performance of the proposed graph representation approach based on intermediate code for the code clone detection task on the BigCloneBench dataset. In order to improve performance, the embedded representation of intermediate code is initialized based on pretrained vectors learned from the collected LLVM IR dataset in advance. The experimental results show that our proposed intermediate code-based graph approach performs better than existing functional code clone detection approaches. Especially for the type-4 code clone detection, our approach outperforms the baseline approaches by an average of 33.49% in the term of <i>F</i>1 score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Reliability

Lead the way for us

Journal: IEEE Transactions on Reliability	Publication Date: Jun 1, 2023
Citations: 6

Similar Papers

Combining Holistic Source Code Representation with Siamese Neural Networks for Detecting Code Clones
Smit Patel ... Roopak Sinha
-
Smit Patel, et. al.Smit Patel ... Roopak Sinha
01 Jan 2021
01 Jan 2021

A collaborative method for code clone detection using a deep learning model
S Karthik ... B Rajdeepa
Advances in Engineering Software | VOL. 174
S Karthik, et. al.S Karthik ... B Rajdeepa
01 Nov 2022
Advances in Engineering Software | VOL. 174

Semantic Code Clone Detection Based on Community Detection
Zexuan Wan ... Chunli Xie
International Journal of Software Engineering and Knowledge Engineering | VOL. 34
Zexuan Wan, et. al.Zexuan Wan ... Chunli Xie
26 Jul 2024
International Journal of Software Engineering and Knowledge Engineering | VOL. 34

Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods
Ryota Ami ... Hirohide Haga
Journal of Software Engineering and Applications | VOL. 10
Ryota Ami, et. al.Ryota Ami ... Hirohide Haga
01 Jan 2017
Journal of Software Engineering and Applications | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Reliability