IR-based technique for linearizing abstract method invocation in plagiarism-suspected source code pair

Oscar Karnalim

doi:10.1016/j.jksuci.2018.01.012

Abstract

According to several works, low-level approach is an effective and efficient solution for detecting source code plagiarism. Instead of relying on source code tokens, it compares the executable form of given code; that form only contains semantic-preserving tokens and is resistant to various plagiarism attacks. However, to our knowledge, an issue about statically linearizing abstract method (i.e. replacing each abstract method invocation with its respective invoked method content without considering invocation semantic) has not been handled comprehensively. Such issue, at some extent, will generate inaccurate plagiarism detection result when handling object-oriented source codes. This paper aims to solve such issue locally per plagiarism-suspected pair. It will generate all possible linearization pair alternatives and select the correct one through IR-based similarity. According to our evaluation regarding the reversed number of mismatched token, the number of false positive, and the number of process, proposed technique is more effective and efficient when compared to state-of-the-art and combinatoric technique. In addition, it is also observed that each IR mechanism used in proposed technique has its own exclusive benefit for selecting the correct linearized forms.

Full Text