Implantation of indexing optimization technology for highly specialized terms based on Metaphone phonetical algorithm

Volodymyr Buriachok,Pavlo Skladannyi,Volodymyr Sokolov,Matin Hadzhyiev,Lidiia Kuzmenko

doi:10.15587/1729-4061.2019.181943

Abstract

When compiling databases, for example to meet the needs of healthcare establishments, there is quite a common problem with the introduction and further processing of names and last names of doctors and patients that are highly specialized both in terms of pronunciation and writing. This is because names and last names of people cannot be unique, their notation is not subject to any rules of phonetics, while their length in different languages may not match. With the advent of the Internet, this situation has become generally critical and can lead to that multiple copies of e-mails are sent to one address. It is possible to solve the specified problem by using phonetic algorithms for comparing words Daitch-Mokotoff, Soundex, NYSIIS, Polyphone, and Metaphone, as well as the Levenshtein and Jaro algorithms, Q-gram-based algorithms, which make it possible to find distances between words. The most widespread among them are the Soundex and Metaphone algorithms, which are designed to index the words based on their sound, taking into consideration the rules of pronunciation. By applying the Metaphone algorithm, an attempt has been made to optimize the phonetic search processes for tasks of fuzzy coincidence, for example, at data deduplication in various databases and registries, in order to reduce the number of errors of incorrect input of last names. An analysis of the most common last names reveals that some of them are of the Ukrainian or Russian origin. At the same time, the rules following which the names are pronounced and written, for example in Ukrainian, differ radically from basic algorithms for English and differ quite significantly for the Russian language. That is why a phonetic algorithm should take into consideration first of all the peculiarities in the formation of Ukrainian last names, which is of special relevance now.

Highlights

A variety of mechanisms and approaches can be used to search for fuzzy matches between words and phrases: distance calculation by Levenshtein, Damerau-Levenshtein, or Hemming, similarities by Jaro or Jaro-Winkler, construction of Q-grams, etc. [1]
The optimization should be determined based on a decrease in the volume of search indexes compared with the full sample to the use of a phonetic algorithm
The following tasks have been set: – to investigate the frequency of using Ukrainian last names in the territory of modern Ukraine; – to construct a phonetic algorithm for indexes using a sample of Ukrainian last names; – to conduct an experimental research and implement an optimization technology for the phonetic algorithm for indexes using a sample of Ukrainian last names; – to conduct an experimental research and to implement an optimization technology for search queries related to medicinal products when two related languages mix

Summary

Introduction

A variety of mechanisms and approaches can be used to search for fuzzy matches between words and phrases: distance calculation by Levenshtein, Damerau-Levenshtein, or Hemming, similarities by Jaro or Jaro-Winkler, construction of Q-grams, etc. [1]. These algorithms are universal and their use is justified when analyzing long literals with finite alphabets (including an analysis of similarity and search for mutations in DNA and RNA). Phonetic algorithms can be used to define n-multiple errors in typos and misspellings, but none of them:. That is why there is a task to simplify and unify the process of sound perception. This is primarily predetermined by the medical reform, within which the automation of medical records is conducted by heads of hospitals, doctors, pharmacists, laboratory staff, diagnostic and junior medical personnel, as well as by patients. It is an urgent task to improve, first of all, medical information systems at the stages of entering new data and searching through existing databases

Literature review and problem statement

The aim and objectives of the study

Studying the frequency of using Ukrainian last names

Recommendations for introducing a phonetic algorithm for indexes

10. Discussion of results of studying a phonetic algorithm for indexes

Findings

11. Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eastern-European Journal of Enterprise Technologies	Publication Date: Oct 29, 2019
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Implantation of indexing optimization technology for highly specialized terms based on Metaphone phonetical algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eastern-European Journal of Enterprise Technologies

Lead the way for us

Similar Papers

NameGist: a novel phonetic algorithm with bilingual support
Shahidul Islam Khan ... Md Mahmudul Hasan
International Journal of Speech Technology | VOL. 22
Shahidul Islam Khan, et. al.Shahidul Islam Khan ... Md Mahmudul Hasan
29 Oct 2019
International Journal of Speech Technology | VOL. 22

Deduplication Method for Ukrainian Last Names, Medicinal Names, and Toponyms Based on Metaphone Phonetic Algorithm
Zhengbing Hu ... V Buriachok
-
Zhengbing Hu, et. al.Zhengbing Hu ... V Buriachok
06 Aug 2020
06 Aug 2020

Study Existing Various Phonetic Algorithms and Designing and Development of a working model for the New Developed Algorithm and Comparison by implementing it with Existing Algorithm(s)
C K Kumbharana ... Vimal P.Parmar
International Journal of Computer Applications | VOL. 98
C K Kumbharana, et. al.C K Kumbharana ... Vimal P.Parmar
18 Jul 2014
International Journal of Computer Applications | VOL. 98

On the performance of phonetic algorithms in microtext normalization
Yerai Doval ... Jesús Vilares
Expert systems with applications | VOL. 113
Yerai Doval, et. al.Yerai Doval ... Jesús Vilares
07 Jul 2018
Expert systems with applications | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implantation of indexing optimization technology for highly specialized terms based on Metaphone phonetical algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eastern-European Journal of Enterprise Technologies