FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

Linghao Zhang,Senlin Luo,Limin Pan,Zhouting Wu,Kun Gong

doi:10.1016/j.engappai.2024.108199

Abstract

Code clone detection can find similar or the same code snippets, which is important in analyzing homologous components, discovering redundant code, and improving software system development and maintenance efficiency. A crucial challenge is to extract more functional semantic similarity from code in heterogeneous conditions, such as a cross-language scenario. Existing methods mainly exploit sequence models with only lexical and statistical features to compare code pairs, which are susceptible to linguistic feature noise and misclassify code pairs that have similar structure dependencies such as control flow. Meanwhile, there are issues with inconsistent node types and a great variation of node numbers while capturing structure-dependent features, resulting in a misaligned distribution of clone pairs, and weakening the detection precision. This work presents a novel cross-language code clone detection method. It represents code with a graph structure based on abstract syntax trees and introduces a global node to strengthen the connection between control flows. Prune the graph structure based on key node protection rules to reduce the impact of linguistic feature noise. Besides, optimize graph matching networks for cross-language abstract syntax trees by using contrastive loss to align the functional semantic distribution of clone pairs. The method distills the invariant functional semantic similarity with a huge discrepancy of the code graph in heterogeneous cross-language conditions. Experiment results show that the proposed method achieves scores of 0.95, 0.98, and 0.96 in terms of precision, recall and F1-score and substantially outperforms the state-of-the-art baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Similar Papers

Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree
Wenhan Wang ... Zhi Jin
-
Wenhan Wang, et. al.Wenhan Wang ... Zhi Jin
01 Feb 2020
01 Feb 2020

A collaborative method for code clone detection using a deep learning model
S Karthik ... B Rajdeepa
Advances in Engineering Software | VOL. 174
S Karthik, et. al.S Karthik ... B Rajdeepa
01 Nov 2022
Advances in Engineering Software | VOL. 174

SCCD-GAN: An Enhanced Semantic Code Clone Detection Model Using GAN
Kun Xu ... Yan Liu
-
Kun Xu, et. al.Kun Xu ... Yan Liu
17 Dec 2021
17 Dec 2021

Exploiting Abstract Syntax Trees to Locate Software Defects

-

26 Aug 2015
26 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence