Abstract

In this paper, we present methods of transliteration and back-transliteration. In Korean technical documents and web documents, many English words and Japanese words are transliterated into Korean words. These transliterated words are usually technical terms and proper nouns, so it is hard to find them in a dictionary. Therefore an automatic transliteration system is needed. Previous transliteration models restrict an information length to two or three letters per letter. However, most transliteration phenomena cannot be explained with a single standard rule especially in Korean. Various rules such as the origin of a word and profession of users are applied to each transliteration. The restriction of information length may lose the discriminative information of each transliteration rule. In this paper, we propose the methods that find similar words which have the longest overlap with an input word. To find similar words without the loss of each transliteration rule, phoneme chunks that do not have a length limit are used. By merging phoneme chunks, an input word is transliterated. With our proposed method, we could get 86% character accuracy and 53% word accuracy in an English-to-Korean transliteration test.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.