SCCD-GAN: An Enhanced Semantic Code Clone Detection Model Using GAN

Kun Xu,Yan Liu

doi:10.1109/icece54449.2021.9674552

Abstract

Code clone refers to a pair of semantically similar but syntactically similar or different code fragments that exist in code base. Excessive code clones in software system will cause a negative impact on system development and maintenance. In recent years, as deep learning has become a hot research area of machine learning, researchers have tried to apply deep learning techniques to code clone detection tasks. They have proposed a series of detection techniques using including unstructured (code in the form of sequential tokens) and structured (code in the form of abstract syntax trees and control-flow graphs) information to detect semantically similar but syntactically different code clone, which is the most difficult-to-detect clone type. However, although these methods have achieved an important improvement in the precision of semantic code clone detection, the corresponding false positive rate(FPR) is also at a very high level, making these methods unable to be effectively applied to real-world code bases. This paper proposed SCCD-GAN, an enhanced semantic code clone detection model which based on a graph representation form of programs and uses Graph Attention Network to measure the similarity of code pairs and achieved a lower detection FPR than existing methods. We built the graph representation of the code by expanding the control flow and data flow information to the original abstract syntax tree, and equipped with an attention mechanism to our model that focuses on the most important code parts and features which contribute much to the final detection precision.We implemented and evaluated our proposed method based on the benchmark dataset in the field of code clone detection-BigCloneBench2 and Google Code Jam. SCCD-GAN performed better than the existing state-of-the-art methods in terms of precision and false positive rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SCCD-GAN: An Enhanced Semantic Code Clone Detection Model Using GAN

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree
Wenhan Wang ... Zhi Jin
-
Wenhan Wang, et. al.Wenhan Wang ... Zhi Jin
01 Feb 2020
01 Feb 2020

On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems
Md Sharif Uddin ... Chanchal K Roy
-
Md Sharif Uddin, et. al.Md Sharif Uddin ... Chanchal K Roy
01 Oct 2011
01 Oct 2011

Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
Dawei Yuan ... Tao Zhang
IEEE Transactions on Reliability | VOL. 72
Dawei Yuan, et. al.Dawei Yuan ... Tao Zhang
01 Jun 2023
IEEE Transactions on Reliability | VOL. 72

GRRLN: Gated Recurrent Residual Learning Networks for code clone detection
Xiangping Zhang ... Min Shi
Journal of Software: Evolution and Process | VOL. 36
Xiangping Zhang, et. al.Xiangping Zhang ... Min Shi
07 Feb 2024
Journal of Software: Evolution and Process | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SCCD-GAN: An Enhanced Semantic Code Clone Detection Model Using GAN

Abstract

Talk to us

Similar Papers