Abstract

Proper name transliteration, the pronunciation based translation of a proper name, is important to many multilingual natural language processing task, such as Statistical Machine Translation (SMT) and Cross Lingual Information Retrieval (CLIR). This task is extremely challenging due to the pronunciation difference between the source and target language. A given proper name can lead to many different transliterations. In the past, research efforts had demonstrated a 30–50% error using top-1 reference for transliteration. This error leads to performance degradation for many applications. In this paper, a novel approach to verify a given proper name transliteration pair using a discrete variant Hidden Markov Model (HMM) alignment is proposed. The state emission probabilities are derived from SMT phrase tables. The proposed method yields an Equal Error Rate (EER) of 3.73% on a 300 matched and 1000 unmatched name pairs test set. By comparison, the commonly used SMT framework yields 6.5% EER under the best configuration. The widely used edit distance approach has an EER of 22%. Our new method achieves high accuracy and low complexity, and provides an alternative for name transliteration in CLIR and other cross lingual natural language applications such as word alignment and machine translation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call