Abstract

Identifying the same words in two or more texts is the first step in the process of detecting plagiarism. Plagiarism detection software are commercially available but relatively expensive. Although some software is offered for free, the features provided are very limited. Therefore, a word similarity detection system is needed to be used as an alternative for users that can be freely accessed. The application of the pattern matching method is one of the solutions that can be used to find the similarity of words between documents. There are several algorithms that can be used as a method to find the similarity of words in the text, including the Winnowing algorithm which is known to have good performance in detecting similarity of words. Winnowing is a hashing-approach based algorithm that applies hash-function and window formation to obtain fingerprints during pattern matching. Based on these fingerprints, the word similarity level can be calculated. Previous studies have only calculated the level of similarity of words based on the character (character-level), while the calculation of the level of similarity based on words (word-level) is still limited. This research was carried out with the aim of measuring the level of similarity of words using the Winnowing algorithm and word-level trigrams. The results showed that the Winnowing algorithm which was applied using word-level trigrams could detect similarities in the text of 76.84%, 52.29%, 37.40%, and 19.29%, respectively. From the results of the study, it can be concluded that the pattern matching method with the Winnowing algorithm and word-level trigrams can be used to measure the level of similarity of the text

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.