Abstract

We address the task of joint training of transliteration models for multiple language pairs ( multilingual transliteration). This is an instance of multitask learning, where individual tasks (language pairs) benefit from sharing knowledge with related tasks. We focus on transliteration involving related tasks i.e., languages sharing writing systems and phonetic properties ( orthographically similar languages). We propose a modified neural encoder-decoder model that maximizes parameter sharing across language pairs in order to effectively leverage orthographic similarity. We show that multilingual transliteration significantly outperforms bilingual transliteration in different scenarios (average increase of 58% across a variety of languages we experimented with). We also show that multilingual transliteration models can generalize well to languages/language pairs not encountered during training and hence perform well on the zeroshot transliteration task. We show that further improvements can be achieved by using phonetic feature input.

Highlights

  • Transliteration is a key building block for multilingual and cross-lingual NLP since it is essential for (i) handling of names in applications like machine translation (MT) and cross-lingual information retrieval (CLIR), and (ii) user-friendly input methods

  • Unlike previous approaches which pivot over bilingual transliteration models, we propose zeroshot transliteration that pivots over multilingual transliteration models

  • We observe that multilingual training substantially improves the accuracy over bilingual training in all datasets

Read more

Summary

Introduction

Transliteration is a key building block for multilingual and cross-lingual NLP since it is essential for (i) handling of names in applications like machine translation (MT) and cross-lingual information retrieval (CLIR), and (ii) user-friendly input methods. No prior work exists on jointly training multiple language pairs (referred to as multilingual transliteration ). Multilingual transliteration can be seen as an instance of multi-task learning, where training each language pair constitutes a task. Multi-task learning works best when the tasks are related to each other, so sharing of knowledge across tasks is beneficial. Multilingual transliteration can be beneficial, if the languages involved are related. We identify such a natural and practically useful scenario: multilingual transliteration involving languages that are related on account of sharing writing systems and phonetic properties. We refer to such languages as orthographically similar languages

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.