Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen

Abstract

Plagiarism is the act of taking part or all of one's ideas in the form of documents or texts without including sources of information retrieval. This study aims to detect the similarity of text documents using the cosine similarity algorithm and weighting TF-IDF so that it can be used to determine the value of plagiarism. The document used for comparison of this text is an abstract of Indonesian. The results of the study, namely when stemming the similarity value is higher on average 10% than the stemming process is not done. This study produces a similarity value above 50% for documents with a high degree of similarity. Whereas documents with low similarity levels or no plagiarism produce similarity values below 40%. With the method used in the preprocessing consisting of folding cases, tokenizing, removeal stopwords, and stemming. After the preprocessing process, the next step is to calculate the weighting of TF-IDF and the similarity value using cosine similarity so that it gets a percentage similarity value. Based on the experimental results of the cosine similarity algorithm and weighting TF-IDF, it can produce similarity values from each comparative document

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Jurnal Linguistik Komputasional (JLK)	Publication Date: Mar 26, 2019
Citations: 1	License type: cc-by-nc-sa

R Discovery Prime

R Discovery Prime

Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen

Abstract

Talk to us

Similar Papers

More From: Jurnal Linguistik Komputasional (JLK)

Lead the way for us

Similar Papers

Research on Flow Classification Model Based on Similarity and Machine Learning Algorithm
Meigen Huang ... Lingling Wu
-
Meigen Huang, et. al.Meigen Huang ... Lingling Wu
26 Feb 2021
26 Feb 2021

Song Recommendations Based on Artists with Cosine Similarity Algorithms and K-Nearest Neighbor
Gst Ayu Vida Mastrika Giri ... Muhammad Arief Budiman
JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) | VOL. 8
Gst Ayu Vida Mastrika Giri, et. al.Gst Ayu Vida Mastrika Giri ... Muhammad Arief Budiman
04 Feb 2020
JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) | VOL. 8

Citation Analysis on Scientific Articles Using Cosine Similarity
Ulfa Mardatillah ... Ichsan Taufik
-
Ulfa Mardatillah, et. al.Ulfa Mardatillah ... Ichsan Taufik
19 Aug 2021
19 Aug 2021

Sustainable Development: A Semantics-aware Trends for Movies Recommendation System using Modern NLP
Shadi AlZu’b ... Amjed Zraiqat
International Journal of Advances in Soft Computing and its Applications | VOL. 14
Shadi AlZu’b, et. al.Shadi AlZu’b ... Amjed Zraiqat
28 Nov 2022
International Journal of Advances in Soft Computing and its Applications | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen

Abstract

Talk to us

Similar Papers

More From: Jurnal Linguistik Komputasional (JLK)