Plagiarism is a serious threat, especially to academic honesty, so a detection system that can analyze various types of documents is needed. This research develops a plagiarism detection system using Optical Character Recognition (OCR) to convert image text into digital text. Rabin – Karp algorithm with rolling hash and Dice Coefficient Similarity is applied to measure similarities between documents. Testing is carried out on .doc, .txt, .jpg files. As a result, the system can detect plagiarism well in clear text and image documents, but accuracy can decrease in low-quality images. In conclusion, the similarity of content, sentence structure, and format affects the degree of similarity, while OCR techniques work effectively even though they are limited to low-quality images.
Read full abstract