Abstract

Low-level approach is a recent way for source code plagiarism detection. Instead of relying on source code tokens, it relies on low-level structural representation resulted from compiling given source codes. However, to date, existing low-level approaches are unsuitable for handling large-sized source codes; their comparison takes either quadratic or cubic time complexity. This paper presents a low-level approach with a more-efficient comparison; the comparison only takes linear time complexity with the help of Cosine Correlation in Vector Space Model. According to our evaluation, three findings can be deducted. First, proposed approach can be more efficient and effective than the state-of-the-art approach. Second, low-level structural representation can be more effective and efficient than source code token sequence as a medium for source code plagiarism detection. Third, subsequence length (which is related to n in n-gram) can be inversely proportional to effectiveness without significant impact on efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call