Abstract
Source code similarity detection has extensive applications in computer programming teaching and software intellectual property protection. In the teaching of computer programming courses, students may utilize some complex source code obfuscation techniques, e.g., opaque predicates, loop unrolling, and function inlining and outlining, to reduce the similarity between code fragments and avoid the plagiarism detection. Existing source code similarity detection approaches only consider static features of source code, making it difficult to cope with more complex code obfuscation techniques. In this paper, we propose a novel source code similarity detection approach by considering the dynamic features at runtime of source code using process mining. More specifically, given two pieces of source code, their running logs are obtained by source code instrumentation and execution. Next, process mining is used to obtain the flow charts of the two pieces of source code by analyzing their collected running logs. Finally, similarity of the two pieces of source code is measured by computing the similarity of these two flow charts. Experimental results show that the proposed approach can deal with more complex obfuscation techniques including opaque predicates and loop unrolling as well as function inlining and outlining, which cannot be handled by existing work properly. Therefore, we argue that our approach can defeat commonly used code obfuscation techniques more effectively for source code similarity detection than the existing state-of-the-art approaches.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have