Leveraging Orthographic Similarity for Multilingual Neural Transliteration

Anoop Kunchukuttan,Mitesh Khapra,Pushpak Bhattacharyya,Gurneet Singh

doi:10.1162/tacl_a_00022

Abstract

We address the task of joint training of transliteration models for multiple language pairs ( multilingual transliteration). This is an instance of multitask learning, where individual tasks (language pairs) benefit from sharing knowledge with related tasks. We focus on transliteration involving related tasks i.e., languages sharing writing systems and phonetic properties ( orthographically similar languages). We propose a modified neural encoder-decoder model that maximizes parameter sharing across language pairs in order to effectively leverage orthographic similarity. We show that multilingual transliteration significantly outperforms bilingual transliteration in different scenarios (average increase of 58% across a variety of languages we experimented with). We also show that multilingual transliteration models can generalize well to languages/language pairs not encountered during training and hence perform well on the zeroshot transliteration task. We show that further improvements can be achieved by using phonetic feature input.

Highlights

Transliteration is a key building block for multilingual and cross-lingual NLP since it is essential for (i) handling of names in applications like machine translation (MT) and cross-lingual information retrieval (CLIR), and (ii) user-friendly input methods
Unlike previous approaches which pivot over bilingual transliteration models, we propose zeroshot transliteration that pivots over multilingual transliteration models
We observe that multilingual training substantially improves the accuracy over bilingual training in all datasets

Summary

Introduction

Transliteration is a key building block for multilingual and cross-lingual NLP since it is essential for (i) handling of names in applications like machine translation (MT) and cross-lingual information retrieval (CLIR), and (ii) user-friendly input methods. No prior work exists on jointly training multiple language pairs (referred to as multilingual transliteration ). Multilingual transliteration can be seen as an instance of multi-task learning, where training each language pair constitutes a task. Multi-task learning works best when the tasks are related to each other, so sharing of knowledge across tasks is beneficial. Multilingual transliteration can be beneficial, if the languages involved are related. We identify such a natural and practically useful scenario: multilingual transliteration involving languages that are related on account of sharing writing systems and phonetic properties. We refer to such languages as orthographically similar languages

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2018
Citations: 44	License type: cc-by

R Discovery Prime

R Discovery Prime

Leveraging Orthographic Similarity for Multilingual Neural Transliteration

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
Zehui Lin ... Xipeng Qiu
-
Zehui Lin, et. al.Zehui Lin ... Xipeng Qiu
01 Jan 2020
01 Jan 2020

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Saurav Jha ... Anil Kumar Singh
Journal of Language Modelling | VOL. 7
Saurav Jha, et. al.Saurav Jha ... Anil Kumar Singh
16 Sep 2019
Journal of Language Modelling | VOL. 7

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation
Yong Cheng ... Pidong Wang
-
Yong Cheng, et. al.Yong Cheng ... Pidong Wang
01 Jan 2021
01 Jan 2021

Learning Language Specific Sub-network for Multilingual Machine Translation
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Orthographic Similarity for Multilingual Neural Transliteration

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics