Abstract

A name is usually used to identify persons and objects. Similarity between texts is useful for retrieving names regardless of misspelling and different spelling names. Moreover, similarity between names of health products may lead to safety issues for consumers. Although Levenshtein algorithm has been used for measuring similarities between a pair of strings, some factors may affect human perception. In this paper, effects of substring position and character similarity are taken into account. A set of experiments were done using Thai herb names collected in Thai herbal database. Similarity scores in percentage were given by six evaluators compared to the values provided by the original and modified Levenshtein algorithms. From the results, both factors have effects on human perception. For substring position, evaluators focused on substring portions between pairs of strings. When the same positions of substrings in a pair of strings are matched, more similarity scores should be given. For character similarity, groups of similar characters in Thai consonant letters are assigned the weight between 0 and 1 based on structure of Thai characters. Human perception responds to similarity on a pair of characters. The average similarity scores from evaluators were closer to our proposed Levenshtein algorithm with character similarity. In conclusion, similarities calculated from original Levenshtein algorithm should be adjusted based on substring position and character similarity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.