The Comparation of Distance-Based Similarity Measure to Detection of Plagiarism in Indonesian Text

Tari Mardiana,Teguh Bharata Adji,Indriana Hidayah

doi:10.1007/978-3-662-46742-8_14

Abstract

AbstractThe accesible loose information through the Internet leads to plagiarism activities use the copy-paste-modify practice is growing rapidly. There have been so many methods, algorithm, and even softwares that developed till this day to avoid and detect the plagiarism which can be used broadly unlimited on a certain subject. Research about detection of plagiarism in Indonesian Language develop day by day, although not significant as English Language. This paper proposes several models of distance-based similarity measure which could be used to assess the similarity in Indonesian text, such as Dice’s similarity coefficient, Cosine similarity, and Jaccard coefficient. It implemented together with Rabin-Karp algorithm that common used to detect plagiarism in Indonesian Language. The analysis technique of plagiarism is fingerprint analysis to create fingerprint document according to n-gram value that has been determined, then the similarity value will be counted according to the same number of fingerprint between texts. Small data text about Information System tested in this case and it divided into four kinds of text document with some modified. First document is original text, second is 50% of original text adding with 50% of another text, third 50% original text modified using sinonym and paraphase, fourth some position of text in original text changed. From the experimental result, cosine similarityshow better performance in generating value accuracy compared to the dice coefficient and Jaccard coefficient. This model is expected to be used as an alternative type of statistical algorithms that implement the n-grams in the process especially to detect plagiarism in Indonesian text.KeywordsFingerprintIndonesianPlagiarismSimilarityText

Full Text