Penerapan Metode Rabin-Karp untuk Mengukur Kemiripan Kata Dua Dokumen Berbasis Web

Abstract

Scientific literature plays a significant role in the academic requirements of colleges, encompassing various types such as papers, reports, journals, and scripts. However, the issue of plagiarism, including the copying and plagiarizing of others' work, remains prevalent in the creation of scientific papers. In particular, digital content plagiarism often involves copy-pasting and quoting from original documents. To address this, measuring the similarity of words between documents becomes essential. In Dhamayanti's research, the recommendation is to enhance the Rabin-Karp algorithm by utilizing a distinct method [1]. This study builds upon previous research employing a string-matching method. Instead of the conventional cosine method, the substitution method employed string-Karp techniques within the Rabin-Karp algorithm, resulting in improved similarity percentages. The manufacturing of the application adopts the string-matching method using the Rabin-Karp algorithm. The algorithm matches 5-gram word sequences converted into hash values, and the similarity percentage is determined based on matching hash values. The presence of identical words indicates similarity. The application is tested using six scientific writing documents from diverse sources with related titles. Through 15 test runs, the accuracy level reached 90%.

Full Text