OneSpace: Detecting cross-language clones by learning a common embedding space

Mohammed El Arnaoty,Francisco Servant

doi:10.1016/j.jss.2023.111911

Abstract

Identifying clone code fragments across different languages can enhance the productivity of software developers in several ways. However, the clone detection task is often studied in the context of a single language and less explored for code snippets spanning different languages. In this paper, we present OneSpace, a new cross-language clone detection approach. OneSpace projects different programming languages to the same embedding space using both code and API data. OneSpace, hence, leverages a Siamese Network to infer the similarity of the embedded programs. We evaluate OneSpace by detecting clones across three language pairs; JAVA-Python, Java-C++ and Java-C. We compared OneSpace with the other state-of-art techniques, SupLearn and CLCDSA. In our evaluation, OneSpace provided higher effectiveness than the state of the art. Our ablation study validated some of our intuitions in designing OneSpace, particularly that using a single embedding space (as opposed to separate ones) provides higher effectiveness. Additionally, we designed a variant of OneSpace that uses Word-Mover-Distance Algorithm and provides lower effectiveness, but is much more efficient. We also found that OneSpace provides higher effectiveness than the state of the art, even for: complex implementations, single-method implementations, varying ratios of positive to negative clones in training, varying amounts of training data, and for additional programming languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OneSpace: Detecting cross-language clones by learning a common embedding space

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Nov 22, 2023
Citations: 2

Similar Papers

Text to image generative model using constrained embedding space mapping
Subhajit Chaudhury ... Ryuki Tachibana
-
Subhajit Chaudhury, et. al.Subhajit Chaudhury ... Ryuki Tachibana
01 Sep 2017
01 Sep 2017

Neural Retrieval with Partially Shared Embedding Spaces
Bo Li ... Le Jia
-
Bo Li, et. al.Bo Li ... Le Jia
17 Oct 2018
17 Oct 2018

Cross-Modal Image Retrieval Considering Semantic Relationships With Many-to-Many Correspondence Loss
Huaying Zhang ... Rintaro Yanagi
IEEE Access | VOL. 11
Huaying Zhang, et. al.Huaying Zhang ... Rintaro Yanagi
01 Jan 2023
IEEE Access | VOL. 11

Common Latent Embedding Space for Cross-Domain Facial Expression Recognition
Run Wang ... Peng Song
IEEE Transactions on Computational Social Systems | VOL. 11
Run Wang, et. al.Run Wang ... Peng Song
01 Apr 2024
IEEE Transactions on Computational Social Systems | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OneSpace: Detecting cross-language clones by learning a common embedding space

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software