Viewing functions as token sequences to highlight similarities in source code

Michel Chilowicz,Étienne Duris,Gilles Roussel

doi:10.1016/j.scico.2012.11.008

Abstract

The detection of similarities in source code has applications not only in software re-engineering (to eliminate redundancies) but also in software plagiarism detection. This later can be a challenging problem since more or less extensive edits may have been performed on the original copy: insertion or removal of useless chunks of code, rewriting of expressions, transposition of code, inlining and outlining of functions, etc. In this paper, we propose a new similarity detection technique not only based on token sequence matching but also on the factorization of the function call graphs. The factorization process merges shared chunks (factors) of codes to cope, in particular, with inlining and outlining. The resulting call graph offers a view of the similarities with their nesting relations. It is useful to infer metrics quantifying similarity at a function level.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Science of Computer Programming	Publication Date: Dec 22, 2012
Citations: 6	License type: other-oa

R Discovery Prime

R Discovery Prime

Viewing functions as token sequences to highlight similarities in source code

Abstract

Talk to us

Similar Papers

More From: Science of Computer Programming

Lead the way for us

Similar Papers

Software plagiarism detection in multiprogramming languages using machine learning approach
Farhan Ullah ... Shehzad Khalid
Concurrency and Computation: Practice and Experience | VOL. 33
Farhan Ullah, et. al.Farhan Ullah ... Shehzad Khalid
15 Oct 2018
Concurrency and Computation: Practice and Experience | VOL. 33

Similarity of Source Code in the Presence of Pervasive Modifications
Chaiyong Ragkhitwetsagul ... Jens Krinke
-
Chaiyong Ragkhitwetsagul, et. al.Chaiyong Ragkhitwetsagul ... Jens Krinke
01 Oct 2016
01 Oct 2016

Function Level Cross-Modal Code Similarity Detection with Jointly Trained Deep Encoders
Zhenzhou Tian ... Lumeng Wang
-
Zhenzhou Tian, et. al.Zhenzhou Tian ... Lumeng Wang
01 Jan 2023
01 Jan 2023

Measuring Code Similarity in Large-Scaled Code Corpora
Chaiyong Ragkhitwetsagul
-
Chaiyong RagkhitwetsagulChaiyong Ragkhitwetsagul
01 Oct 2016
01 Oct 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Viewing functions as token sequences to highlight similarities in source code

Abstract

Talk to us

Similar Papers

More From: Science of Computer Programming