Abstract

Source code similarity detection has extensive applications in computer programming teaching and software intellectual property protection. In the teaching of computer programming courses, students may utilize some complex source code obfuscation techniques, e.g., opaque predicates, loop unrolling, and function inlining and outlining, to reduce the similarity between code fragments and avoid the plagiarism detection. Existing source code similarity detection approaches only consider static features of source code, making it difficult to cope with more complex code obfuscation techniques. In this paper, we propose a novel source code similarity detection approach by considering the dynamic features at runtime of source code using process mining. More specifically, given two pieces of source code, their running logs are obtained by source code instrumentation and execution. Next, process mining is used to obtain the flow charts of the two pieces of source code by analyzing their collected running logs. Finally, similarity of the two pieces of source code is measured by computing the similarity of these two flow charts. Experimental results show that the proposed approach can deal with more complex obfuscation techniques including opaque predicates and loop unrolling as well as function inlining and outlining, which cannot be handled by existing work properly. Therefore, we argue that our approach can defeat commonly used code obfuscation techniques more effectively for source code similarity detection than the existing state-of-the-art approaches.

Highlights

  • Research studies on source code similarity detection can be tracked back to the 1970s, and such techniques have a wide range of applications in the source code plagiarism detection of computer programming teaching and software intellectual property protection

  • We propose a novel source code similarity detection approach for computer programming teaching using process mining. e dynamic features of source code are obtained through the running of the code, and they are used as the basis for measuring the similarity between two code fragments

  • Related Work e performance of antiobfuscation is an important metric to evaluate source code similarity detection [19]. erefore, we first summarize most commonly used code obfuscation techniques . en, we introduce the existing source code similarity detection approaches and their ability to fight against code obfuscation techniques, based on which we summarize the problems of existing approaches

Read more

Summary

Introduction

Research studies on source code similarity detection can be tracked back to the 1970s, and such techniques have a wide range of applications in the source code plagiarism detection of computer programming teaching and software intellectual property protection. Students sometimes use some complex code obfuscation techniques, e.g., opaque predicates, loop unrolling, and function inlining and outlining, to reduce the similarity between code fragments. Us, existing approaches cannot cope with the above complex obfuscation techniques To solve this problem, we propose a novel source code similarity detection approach for computer programming teaching using process mining. The tree-based approaches mainly measure the similarity through the subtrees As a result, they cannot fight against structural obfuscation techniques [20], such as adding redundant statements and loop unrolling. An approach to measuring cross programming languages code similarity is proposes based on the static flow chart of source code [11] This approach only considers the static flow char of the code, making it difficult to fight against opaque predicates and function inlining and outlining. To sum up, graphbased code similarity detection approaches cannot deal with some code obfuscation techniques including opaque predicates, loop unrolling, and some other complex code obfuscation techniques

An Approach Overview
Experiment and Evaluation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call