Abstract

This paper presents an overview of the phonetic encoding algorithms designed to determine the similarity of words in sound (pronunciation). Phonetic encoding algorithms are divided into the algorithms for comparing words and the algorithms for determining the distance between words. Word comparison algorithms, such as SoundEx, NYSIIS, Daitch–Mokotoff, Metaphone, and Polyphone, as well as algorithms for determining the distance between words, such as Levenshtein, Jaro, and N-grams, are described. For each algorithm, the advantages and shortcomings are discussed, and an analog for the Russian language is given. For eliminating the common shortcomings of phonetic encoding algorithms, the idea suggested in this paper is to use not the letter sequences of words, but the sequences of their elementary sounds. In this case, word recognition, record linkage, and word indexing by sounds are expected to improve.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call