A Proposed Model for Source Code Reuse Detection in Computer Programs

Zahra Setoodeh,Mohammad Reza Moosavi,Mostafa Fakhrahmad,Mohammad Bidoki

doi:10.1007/s40998-020-00403-8

Abstract

Source code reuse detection has become of growing significance as a common plagiarism prevention practice in academic research. For a large collection of source codes, the manual detection of the code reuse seems impractical, and there is a vital need for automatic and highly accurate tools. This paper introduces a structure-based approach for recognizing source code (SOCO) reuse in reference programs. The proposed model consists of the three main phases; preprocessing, sequence generation, and decision-making based on estimated similarities. Firstly, important instructions in each code file are identified, and source code is converted to a string of specific tokens. A sequence alignment process is then carried out, and the tree representation of the source code is constructed. In the third phase, the similarity values among the code files are estimated using three different innovative strategies based on both lexical and structural comparison of source codes. Finally, the system decides on each pair of files. The SOCO-2014 corpus is used for evaluating the method. The comparative experimental results of our model and that of the contest participants indicate that our proposed method’s performance is acceptable and promising.

Full Text