Abstract

Phonetic algorithm plays an essential role in many applications including name-matching, database record linkage, spelling correction, search recommendations, etc. Since 1918, many phonetic algorithms have been proposed by the researchers. Soundex, Match Rating Codex, NYSIIS, Metaphone, and Double Metaphone are among the frequently used phonetic algorithms. These algorithms were primarily developed for English phonetics, and they perform well for their intended purposes. Above algorithms do not support Bengali Language and show poor performance for Bengali phonetic representation in the English language. Some phonetic algorithms, e.g., NameSignifcance, Modified NameSignifcance, etc., have been proposed recently by researchers to deal with Bengali phonetic names but their performances are not up to the mark for English names. Besides, these algorithms do not support names written in the Bengali Language, i.e., Bengali Unicode. Bengali language, also known as Bangla among natives, is counted as the seventh most spoken language in the world. More than 250 million people, around the world, speak in Bengali. Use of Bengali Unicode is increasing in Bangladesh and around the globe with the increasing use of computers everywhere. For example, in different healthcare systems, a patient’s name can be stored both in English representation of Bengali or Bengali Unicode. Being unable to process Bengali Unicode leads to failure of linking information of the same patient from multiple databases. This creates a problem in record linkage or entity matching. In this paper, we proposed a novel phonetic algorithm—nameGist which can efficiently encode Bengali phonetic names in English representation, Bengali Unicode names and English phonetic names. We have tested nameGist in various datasets which contains Bengali Phonetic names, Bengali Unicode names, English Phonetic (American or British) names and a mixture of these types. In each case, our proposed algorithm, nameGist, performed better than other algorithms in terms of accuracy and F-measure. NameGist can be used to solve record linkage and entity resolution problems for Bengali, English, and mixed names effectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call