Abstract

Academic institutions, which often publish papers and journals, are ideal testing grounds for the efficacy of counterfeit detection methods. Plagiarism occurs when someone uses the words of another writer without giving that writer proper credit. The proliferation of freeware text editors and the increasing availability of scientific materials online have made the detection of plagiarism a pressing concern; however, the detection of plagiarism in the source code presents a particularly difficult problem. Plagiarism detection algorithms for identification systems and software source code have been the subject of numerous academic investigations. The proposed method combines TF-IDF transformations with K-means clustering to achieve a 99.2% accuracy rate when detecting instances of plagiarism in the source code. This is because it groups similar lines of code together. On the other hand, in comparison to the outcomes produced by the random forest algorithm, the ones that it generates are significantly better. The performance of the MOSS system that was already in place was inferior to that of the system that was used for 90% and 80% of the training set. When contrasting the results, some parameters for evaluation that are considered include precision, recall, and F-measure. The proposed system is implemented in Jupyter Notebook 7 and Python. Also, graphic user interface is designed and implemented to give user friendly experience to the users.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call