Abstract

Purpose: This research aimed to detect source code plagiarism based on Abstract Syntax Tree using Damerau-Levenshtein Distance algorithm, which is expected to streamline the inaccuracies and time-consumption associated with the manual process.Methods: Damerau-Levenshtein Distance algorithm was used to determine the similarity between source code files and calculate F-Measure. The dataset, which consisted of 178 source code files from 20 coursework assignments, was obtained from GitHub by Lawton Nichols in 2019. Damerau-Levenshtein Distance algorithm was used to compute the minimum cost required to transform one line of code into another. Furthermore, ANTLR detected AST, which was processed through preprocessing, including node pruning, function and variable sorting, and log output removal. Result: The result showed that the two methods took 5.704 seconds and 0.996 seconds to complete. The lowest and highest values obtained using F-Measure were 0.16 and 0.8, respectively. Therefore, the system performed detection processes quickly and effectively detected common forms of code plagiarism with difficulty in the more complex forms. Novelty: In conclusion, this research used AST and Damerau-Levenshtein Distance algorithm to calculate the 5 levels of similarity in Java programming language source code. For further development, preprocessing steps were needed to prune unnecessary nodes and detect equivalent but differently syntaxed code.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call