Abstract

Rabin Karp Algorithm is oftentimes used to determine the similarity between texts, using the hash function as a comparison among the string that is being identified and the substring in the text. The choice of the k value in k-gram is often done unrestrained. The number of k values that can be used when cutting some terms will take longer time if tried one by one. In this research, a word cutting test will be performed on a script using K-gram 0 to 8. The results will cover the effect of the value of each k used on the percentage of similarity produced. This research aims to determine the effect of the number of K-grams on the performance of Rabin Karp in text matching. The test underwent 20 sentences and 10 times using the Dice Coefficient as the text similarity testing. The conclusion of this research is that the K-gram 0 to 2 should not be used because of the K-gram basic principle that is character deduction. Accordingly, if the character is 0.1.2 then it does not yet have a meaning thus it gets a high percentage of similarity, based on trials that have been carried out with taking samples of K-gram 0 to 8 from 10 test data sets, researchers recommend that the K-gram 3 is the best among K-grams 0 to 8.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.